2025 Paper 3 Q11

Year: 2025
Paper: 3
Question Number: 11

Course: UFM Statistics
Section: Cumulative distribution functions

Difficulty: 1500.0 Banger: 1500.0

Problem

  1. Let \(\lambda > 0\). The independent random variables \(X_1, X_2, \ldots, X_n\) all have probability density function $$f(t) = \begin{cases} \lambda e^{-\lambda t} & t \geq 0 \\ 0 & t < 0 \end{cases}$$ and cumulative distribution function \(F(x)\). The value of random variable \(Y\) is the largest of the values \(X_1, X_2, \ldots, X_n\). Show that the cumulative distribution function of \(Y\) is given, for \(y \geq 0\), by $$G(y) = (1 - e^{-\lambda y})^n$$
  2. The values \(L(\alpha)\) and \(U(\alpha)\), where \(0 < \alpha \leq \frac{1}{2}\), are such that $$P(Y < L(\alpha)) = \alpha \text{ and } P(Y > U(\alpha)) = \alpha$$ Show that $$L(\alpha) = -\frac{1}{\lambda}\ln(1 - \alpha^{1/n})$$ and write down a similar expression for \(U(\alpha)\).
  3. Use the approximation \(e^t \approx 1 + t\), for \(|t|\) small, to show that, for sufficiently large \(n\), $$\lambda L(\alpha) \approx \ln(n) - \ln\left(\ln\left(\frac{1}{\alpha}\right)\right)$$
  4. Hence show that the median of \(Y\) tends to infinity as \(n\) increases, but that the width of the interval \(U(\alpha) - L(\alpha)\) tends to a value which is independent of \(n\).
  5. You are given that, for \(|t|\) small, \(\ln(1 + t) \approx t\) and that \(e^3 \approx 20\). Show that, for sufficiently large \(n\), there is an interval of width approximately \(4\lambda^{-1}\) in which \(Y\) lies with probability \(0.9\).

Solution

  1. Note that \(\displaystyle F(y) = \mathbb{P}(X_i < y) = \int_0^y \lambda e^{-\lambda t} \d t = 1-e^{-\lambda y}\). Notice also that \begin{align*} G(y) &= \mathbb{P}(Y < y) \\ &= \mathbb{P}(\max_i(X_i) < y) \\ &= \mathbb{P}(X_i < y \text{ for all }i) \\ &= \prod_{i=1}^n \mathbb{P}(X_i < y) \\ &= \prod_{i=1}^n (1-e^{-\lambda y})\\ &= (1-e^{-\lambda y})^n \end{align*} as required.
  2. \begin{align*} && \mathbb{P}(Y < L(\alpha)) &= \alpha \\ \Rightarrow && (1-e^{-\lambda L(\alpha)})^n &= \alpha \\ \Rightarrow && 1-e^{-\lambda L(\alpha)} &= \alpha^{\tfrac1n} \\ \Rightarrow && L(\alpha) &= -\frac{1}{\lambda}\ln \left (1-\alpha^{\tfrac1n} \right) \end{align*} Notice also: \begin{align*} && \mathbb{P}(Y > U(\alpha)) &= \alpha \\ \Rightarrow && 1 - (1-e^{-\lambda U(\alpha)})^n &= \alpha \\ \Rightarrow && U(\alpha) &= -\frac{1}{\lambda}\ln \left ( 1-(1-\alpha)^{\tfrac1n} \right) \end{align*}
  3. \begin{align*} \lambda L(\alpha) &= -\ln \left (1-\alpha^{\tfrac1n} \right) \\ &= -\ln \left (1-e^{\tfrac1n \ln \alpha} \right) \\ &\approx - \ln \left ( 1 - 1 - \frac1n \ln \alpha\right) \tag{\(e^t \approx 1 + t\)} \\ &= -\ln \left ( \frac{1}{n} \ln \frac{1}\alpha \right) \\ &= - \ln \frac{1}{n} - \ln \left ( \ln \frac{1}{\alpha} \right )\\ &= \ln n - \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right) \end{align*} since if \(n\) is large, \(\frac{\ln \alpha}{n}\) is small.
  4. The median is the value where \(\mathbb{P}(Y < M) = \frac12\), or in other words \(L(\frac12)\), but this is \(\approx \frac{\ln n - \ln (\ln 2)}{\lambda} \to \infty\). \begin{align*} && \lambda U(\alpha) &\approx \ln n - \ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right) \\ \Rightarrow && \lambda(U(\alpha) - L(\alpha)) &\approx -\ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right)+ \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right) \\ \Rightarrow && U(\alpha) - L(\alpha) &\to \frac{1}{\lambda} \left ( \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right)-\ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right ) \right) \end{align*} which doesn't depend on \(n\).
  5. Suppose \(\alpha = \frac{1}{20}\) then \begin{align*} U(\alpha) - L(\alpha) &\approx \frac{1}{\lambda} \left (\ln \ln 20 - \ln \ln \frac{20}{19} \right) \\ &= \lambda^{-1} \left (\ln \ln 20 - \ln \ln (1 + \frac{1}{19}) \right) \\ &\approx \lambda^{-1} \left (\ln 3 - \ln \frac{1}{19} \right) \tag{\(\ln(1+t) \approx t\)} \\ &\approx \lambda^{-1} \ln 3 \cdot 19 \\ &\approx \lambda^{-1} (1 + 3) \\ &\approx 4\lambda^{-1} \end{align*} [Note that \(\ln \ln 20 - \ln \ln \frac{20}{19} = 4.0673\ldots\)]
Examiner's report
— 2025 STEP 3, Question 11
Mean: ~9 / 20 (inferred) Average Inferred ~9/20: parts (i)/(ii) well done, good progress on (iv)/(v) using given answer from (iii), but common errors in approximation precision and 'hence' instructions

Parts (i) and (ii) were generally well done. Candidates were often able to make good progress with parts (iv) and (v) even if they had found difficulty with part (iii) (since the answer to part (iii) was given in the question). In part (iii), many candidates incorrectly assumed that α^(1/n) → 0, leading to the incorrect approximation ln(1 − α^(1/n)) ≈ −α^(1/n). A significant number of candidates ignored the word 'hence' in part (iv), either: not realising that L(1/2) was the median; instead solving G(m) = 1/2 directly to find the median m. Most candidates who attempted part (v) focused entirely on estimating the size of U(0.05) − L(0.05), without ever stating that P(L(0.05) < Y < U(0.05)) = 0.9). In parts (iii), (iv) and (v), a number of candidates did not give sufficient precision in the use of approximations/limits, for example writing asymptotic results as equalities which held for all n. Approximately half of the candidates implicitly utilised the identity U(α) = L(1 − α). Whilst, formally, the bound 0 < α ≤ 1/2 given in the question invalidated this method unless the range of the arguments of L and U were first extended to 0 < α < 1, the identity allowed candidates to save considerable repetition of work and candidates who employed this method were not penalised on account of this technical subtlety.

The majority of candidates focused solely on the pure questions, with questions 1, 2 and 8 the most popular. The statistics questions were more popular than the mechanics questions in this exam series. Candidates who did well on this paper generally: were careful to explain and justify the steps in their arguments, explaining what they had done rather than expecting the examiner to infer what had been done from disjointed groups of calculations; paid close attention to what was required by the questions; made fewer unnecessary mistakes with calculations; thought carefully about how to present rigorous arguments involving trig functions and their inverse functions, especially in relation to domain considerations; understood that questions set on the STEP papers require sufficient justification to earn full credit; knew the difference between 'positive' and 'non-negative'; attempted all parts of a question, picking up marks for later parts even when they had not necessarily attempted or completed previous parts. Candidates who did less well on this paper generally: did not pay attention to 'Hence' instructions: this means that you must use the previous part; presented explanations that were not precise enough (e.g. in Question 3 describing the transformations but not in the context of the graphs or in Question 8 not explaining use of trigonometric relationships sufficiently well); made additional assumptions, e.g. that a function was differentiable when this had not been given; tried to present if and only if arguments in a single argument when dealing with each direction separately would have been more appropriate and safer (note that this is not always the case; in general candidates need to consider what is the most appropriate presentation of an if and only if argument); tried to carry out too many steps in one go, resulting in them not justifying the key steps sufficiently; did not take sufficient care with graphs/curve sketching.

Source: Cambridge STEP 2025 Examiner's Report · 2025-p3.pdf
Rating Information

Difficulty Rating: 1500.0

Difficulty Comparisons: 0

Banger Rating: 1500.0

Banger Comparisons: 0

Show LaTeX source
Problem source
\begin{questionparts}
\item Let $\lambda > 0$. The independent random variables $X_1, X_2, \ldots, X_n$ all have probability density function 
$$f(t) = \begin{cases} \lambda e^{-\lambda t} & t \geq 0 \\ 0 & t < 0 \end{cases}$$
and cumulative distribution function $F(x)$.
The value of random variable $Y$ is the largest of the values $X_1, X_2, \ldots, X_n$.
Show that the cumulative distribution function of $Y$ is given, for $y \geq 0$, by 
$$G(y) = (1 - e^{-\lambda y})^n$$
\item The values $L(\alpha)$ and $U(\alpha)$, where $0 < \alpha \leq \frac{1}{2}$, are such that 
$$P(Y < L(\alpha)) = \alpha \text{ and } P(Y > U(\alpha)) = \alpha$$
Show that 
$$L(\alpha) = -\frac{1}{\lambda}\ln(1 - \alpha^{1/n})$$
and write down a similar expression for $U(\alpha)$.
\item Use the approximation $e^t \approx 1 + t$, for $|t|$ small, to show that, for sufficiently large $n$,
$$\lambda L(\alpha) \approx \ln(n) - \ln\left(\ln\left(\frac{1}{\alpha}\right)\right)$$
\item Hence show that the median of $Y$ tends to infinity as $n$ increases, but that the width of the interval $U(\alpha) - L(\alpha)$ tends to a value which is independent of $n$.
\item You are given that, for $|t|$ small, $\ln(1 + t) \approx t$ and that $e^3 \approx 20$.
Show that, for sufficiently large $n$, there is an interval of width approximately $4\lambda^{-1}$ in which $Y$ lies with probability $0.9$.
\end{questionparts}
Solution source
\begin{questionparts}
\item Note that $\displaystyle F(y) = \mathbb{P}(X_i < y) = \int_0^y \lambda e^{-\lambda t} \d t = 1-e^{-\lambda y}$.

Notice also that
\begin{align*}
G(y) &= \mathbb{P}(Y < y) \\
&= \mathbb{P}(\max_i(X_i) < y) \\
&= \mathbb{P}(X_i < y \text{ for all }i) \\
&= \prod_{i=1}^n \mathbb{P}(X_i < y) \\
&=  \prod_{i=1}^n (1-e^{-\lambda y})\\
&= (1-e^{-\lambda y})^n
\end{align*}
as required.

\item \begin{align*}
&& \mathbb{P}(Y < L(\alpha)) &= \alpha \\
\Rightarrow && (1-e^{-\lambda L(\alpha)})^n &= \alpha \\
\Rightarrow && 1-e^{-\lambda L(\alpha)} &= \alpha^{\tfrac1n} \\
\Rightarrow && L(\alpha) &= -\frac{1}{\lambda}\ln \left (1-\alpha^{\tfrac1n} \right)
\end{align*}

Notice also:

\begin{align*}
&& \mathbb{P}(Y > U(\alpha)) &= \alpha \\
\Rightarrow && 1 - (1-e^{-\lambda U(\alpha)})^n &= \alpha \\
\Rightarrow && U(\alpha) &= -\frac{1}{\lambda}\ln \left ( 1-(1-\alpha)^{\tfrac1n} \right)
\end{align*}

\item \begin{align*}
\lambda L(\alpha) &= -\ln \left (1-\alpha^{\tfrac1n} \right) \\
&= -\ln \left (1-e^{\tfrac1n \ln \alpha} \right) \\
&\approx - \ln \left ( 1 - 1 - \frac1n \ln \alpha\right) \tag{$e^t \approx 1 + t$} \\
&= -\ln \left ( \frac{1}{n} \ln \frac{1}\alpha \right) \\
&= - \ln \frac{1}{n} - \ln \left ( \ln \frac{1}{\alpha} \right )\\
&= \ln n - \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right)
\end{align*} since if $n$ is large, $\frac{\ln \alpha}{n}$ is small.

\item The median is the value where $\mathbb{P}(Y < M) = \frac12$, or in other words $L(\frac12)$, but this is $\approx \frac{\ln n - \ln (\ln 2)}{\lambda} \to \infty$.

\begin{align*}
&& \lambda U(\alpha) &\approx \ln n - \ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right) \\
\Rightarrow && \lambda(U(\alpha) - L(\alpha)) &\approx  -\ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right)+ \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right) \\
\Rightarrow && U(\alpha) - L(\alpha) &\to \frac{1}{\lambda} \left (  \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right)-\ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right ) \right)
\end{align*} 
which doesn't depend on $n$.

\item Suppose $\alpha = \frac{1}{20}$ then

\begin{align*}
U(\alpha) - L(\alpha) &\approx \frac{1}{\lambda} \left (\ln \ln 20 - \ln \ln \frac{20}{19} \right) \\
&= \lambda^{-1} \left (\ln \ln 20 - \ln \ln (1 + \frac{1}{19}) \right) \\
&\approx \lambda^{-1} \left (\ln 3 - \ln \frac{1}{19} \right)  \tag{$\ln(1+t) \approx t$} \\
&\approx \lambda^{-1} \ln 3 \cdot 19 \\
&\approx \lambda^{-1} (1 + 3) \\
&\approx 4\lambda^{-1} 
\end{align*}

[Note that $\ln \ln 20 - \ln \ln \frac{20}{19} = 4.0673\ldots$]
\end{questionparts}