Problems

Filters
Clear Filters

3 problems found

2025 Paper 3 Q11
D: 1500.0 B: 1500.0

  1. Let \(\lambda > 0\). The independent random variables \(X_1, X_2, \ldots, X_n\) all have probability density function $$f(t) = \begin{cases} \lambda e^{-\lambda t} & t \geq 0 \\ 0 & t < 0 \end{cases}$$ and cumulative distribution function \(F(x)\). The value of random variable \(Y\) is the largest of the values \(X_1, X_2, \ldots, X_n\). Show that the cumulative distribution function of \(Y\) is given, for \(y \geq 0\), by $$G(y) = (1 - e^{-\lambda y})^n$$
  2. The values \(L(\alpha)\) and \(U(\alpha)\), where \(0 < \alpha \leq \frac{1}{2}\), are such that $$P(Y < L(\alpha)) = \alpha \text{ and } P(Y > U(\alpha)) = \alpha$$ Show that $$L(\alpha) = -\frac{1}{\lambda}\ln(1 - \alpha^{1/n})$$ and write down a similar expression for \(U(\alpha)\).
  3. Use the approximation \(e^t \approx 1 + t\), for \(|t|\) small, to show that, for sufficiently large \(n\), $$\lambda L(\alpha) \approx \ln(n) - \ln\left(\ln\left(\frac{1}{\alpha}\right)\right)$$
  4. Hence show that the median of \(Y\) tends to infinity as \(n\) increases, but that the width of the interval \(U(\alpha) - L(\alpha)\) tends to a value which is independent of \(n\).
  5. You are given that, for \(|t|\) small, \(\ln(1 + t) \approx t\) and that \(e^3 \approx 20\). Show that, for sufficiently large \(n\), there is an interval of width approximately \(4\lambda^{-1}\) in which \(Y\) lies with probability \(0.9\).


Solution:

  1. Note that \(\displaystyle F(y) = \mathbb{P}(X_i < y) = \int_0^y \lambda e^{-\lambda t} \d t = 1-e^{-\lambda y}\). Notice also that \begin{align*} G(y) &= \mathbb{P}(Y < y) \\ &= \mathbb{P}(\max_i(X_i) < y) \\ &= \mathbb{P}(X_i < y \text{ for all }i) \\ &= \prod_{i=1}^n \mathbb{P}(X_i < y) \\ &= \prod_{i=1}^n (1-e^{-\lambda y})\\ &= (1-e^{-\lambda y})^n \end{align*} as required.
  2. \begin{align*} && \mathbb{P}(Y < L(\alpha)) &= \alpha \\ \Rightarrow && (1-e^{-\lambda L(\alpha)})^n &= \alpha \\ \Rightarrow && 1-e^{-\lambda L(\alpha)} &= \alpha^{\tfrac1n} \\ \Rightarrow && L(\alpha) &= -\frac{1}{\lambda}\ln \left (1-\alpha^{\tfrac1n} \right) \end{align*} Notice also: \begin{align*} && \mathbb{P}(Y > U(\alpha)) &= \alpha \\ \Rightarrow && 1 - (1-e^{-\lambda U(\alpha)})^n &= \alpha \\ \Rightarrow && U(\alpha) &= -\frac{1}{\lambda}\ln \left ( 1-(1-\alpha)^{\tfrac1n} \right) \end{align*}
  3. \begin{align*} \lambda L(\alpha) &= -\ln \left (1-\alpha^{\tfrac1n} \right) \\ &= -\ln \left (1-e^{\tfrac1n \ln \alpha} \right) \\ &\approx - \ln \left ( 1 - 1 - \frac1n \ln \alpha\right) \tag{\(e^t \approx 1 + t\)} \\ &= -\ln \left ( \frac{1}{n} \ln \frac{1}\alpha \right) \\ &= - \ln \frac{1}{n} - \ln \left ( \ln \frac{1}{\alpha} \right )\\ &= \ln n - \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right) \end{align*} since if \(n\) is large, \(\frac{\ln \alpha}{n}\) is small.
  4. The median is the value where \(\mathbb{P}(Y < M) = \frac12\), or in other words \(L(\frac12)\), but this is \(\approx \frac{\ln n - \ln (\ln 2)}{\lambda} \to \infty\). \begin{align*} && \lambda U(\alpha) &\approx \ln n - \ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right) \\ \Rightarrow && \lambda(U(\alpha) - L(\alpha)) &\approx -\ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right)+ \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right) \\ \Rightarrow && U(\alpha) - L(\alpha) &\to \frac{1}{\lambda} \left ( \ln \left ( \ln \left ( \frac{1}{\alpha} \right ) \right)-\ln \left ( \ln \left ( \frac{1}{1-\alpha} \right ) \right ) \right) \end{align*} which doesn't depend on \(n\).
  5. Suppose \(\alpha = \frac{1}{20}\) then \begin{align*} U(\alpha) - L(\alpha) &\approx \frac{1}{\lambda} \left (\ln \ln 20 - \ln \ln \frac{20}{19} \right) \\ &= \lambda^{-1} \left (\ln \ln 20 - \ln \ln (1 + \frac{1}{19}) \right) \\ &\approx \lambda^{-1} \left (\ln 3 - \ln \frac{1}{19} \right) \tag{\(\ln(1+t) \approx t\)} \\ &\approx \lambda^{-1} \ln 3 \cdot 19 \\ &\approx \lambda^{-1} (1 + 3) \\ &\approx 4\lambda^{-1} \end{align*} [Note that \(\ln \ln 20 - \ln \ln \frac{20}{19} = 4.0673\ldots\)]

2005 Paper 3 Q12
D: 1700.0 B: 1516.0

Five independent timers time a runner as she runs four laps of a track. Four of the timers measure the individual lap times, the results of the measurements being the random variables \(T_1\) to \(T_4\), each of which has variance \(\sigma^2\) and expectation equal to the true time for the lap. The fifth timer measures the total time for the race, the result of the measurement being the random variable \(T\) which has variance \(\sigma^2\) and expectation equal to the true race time (which is equal to the sum of the four true lap times). Find a random variable \(X\) of the form \(aT+b(T_1+T_2+T_3+T_4)\), where \(a\) and \(b\) are constants independent of the true lap times, with the two properties:

  1. whatever the true lap times, the expectation of \(X\) is equal to the true race time;
  2. the variance of \(X\) is as small as possible.
Find also a random variable \(Y\) of the form \(cT+d(T_1+T_2+T_3+T_4)\), where \(c\) and \(d\) are constants independent of the true lap times, with the property that, whatever the true lap times, the expectation of \(Y^2\) is equal to \(\sigma^2\). In one particular race, \(T\) takes the value 220 seconds and \((T_1 + T_2 + T_3 + T_4)\) takes the value \(220.5\) seconds. Use the random variables \(X\) and \(Y\) to estimate an interval in which the true race time lies.


Solution: Let the expected total time for the race be \(\mu\). Let \(X = aT + b(T_1 + T_2+T_3+T_4)\) then \(\E[X] = a\E[T] + b\E[T_1+\cdots+T_4] = a \mu + b \mu = (a+b)\mu\). So \(a+b=1\). \begin{align*} && \var[X] &= a^2\var[T] + b^2(\var[T_1] + \var[T_2] + \var[T_3] + \var[T_4]) \\ &&&= a^2\sigma^2 + 4b^2 \sigma^2 \\ &&& = \sigma^2 (a^2 + 4(1-a)^2 ) \\ &&&= \sigma^2 (5a^2 - 8a + 4) \\ &&&= \sigma^2 \left ( 5 \left ( a - \frac45 \right)^2 - \frac{16}{5}+4 \right)\\ &&&= \sigma^2 \left ( 5 \left ( a - \frac45 \right)^2 + \frac{4}{5}\right) \end{align*} Therefore variance is minimised when \(a = \frac45, b = \frac15\). Let \(Y = cT + d(T_1 + T_2+T_3+T_4)\) then \begin{align*} && \E[Y^2] &= \E \left [c^2T^2 + 2cd T(T_1+T_2+T_3+T_4) + d^2(T_1+T_2+T_3+T_4)^2 \right] \\ &&&= c^2 (\mu^2 + \sigma^2) + 2cd \mu^2 + d^2 (\var[T_1 + \cdots + T_4] + \mu^2) \\ &&&= c^2(\mu^2+\sigma^2) + 2cd \mu^2 + d^2(4\sigma^2 + \mu^2) \\ &&&= (c^2 + 2cd + d^2) \mu^2 + (c^2+4d^2) \sigma^2 \\ &&&= (c+d)^2 \mu^2 + (c^2+4d^2) \sigma^2 \\ \\ \Rightarrow && d &= -c \\ && 1 &= c^2 + 4d^2 \\ \Rightarrow && c &= \pm \frac{1}{\sqrt5} \\ && d &= \mp \frac{1}{\sqrt5} \end{align*} Given our results, our best estimate for \(\mu\) is \(\frac45 \cdot 220 + \frac15 220.5 = 220.1\). Our estimate for \(\sigma^2 = \left( \frac{1}{\sqrt{5}}(220.5-220) \right)^2 = \frac{1}{20}\). Note that \(\var[X] = \frac45\sigma^2 \approx \frac{1}{25}\) so we are looking at an interval \((220.1 - 0.4, 220.1 + 0.4) = (219.7, 220.5)\) using an interval of two standard errors.

1995 Paper 1 Q14
D: 1516.0 B: 1531.3

  1. Find the maximum value of \(\sqrt{p(1-p)}\) as \(p\) varies between \(0\) and \(1\).
  2. Suppose that a proportion \(p\) of the population is female. In order to estimate \(p\) we pick a sample of \(n\) people at random and find the proportion of them who are female. Find the value of \(n\) which ensures that the chance of our estimate of \(p\) being more than \(0.01\) in error is less than 1\%.
  3. Discuss how the required value of \(n\) would be affected if (a) \(p\) were the proportion of people in the population who are left-handed; (b) \(p\) were the proportion of people in the population who are millionaires.


Solution:

  1. \(\,\) \begin{align*} && \sqrt{p(1-p)} &= \sqrt{p-p^2} \\ &&&= \sqrt{\tfrac14-(\tfrac12-p)^2} \\ &&&\leq \sqrt{\tfrac14} = \tfrac12 \end{align*} Therefore the maximum is \(\tfrac12\) when \(p=\frac12\)
  2. Notice that our estimate \(\hat{p}\) will (for large \(n\)) be follow a normal distribution \(N(p, pq/n)\) by either the normal approximation to the binomial or central limit theorem. We would like \(0.01 > \mathbb{P}\left ( |\hat{p}-p| < 0.01 \right)\) or in other words \begin{align*} && 0.01 &> \mathbb{P}\left ( |\hat{p}-p| > 0.01 \right) \\ &&&=\mathbb{P}\left ( |\sqrt{\frac{pq}{n}}Z+p-p| > 0.01 \right) \\ &&&= \mathbb{P} \left (|Z|>\frac{0.01\sqrt{n}}{\sqrt{pq}}\right) \end{align*} therefore we need \(\frac{0.01\sqrt{n}}{\sqrt{pq}}> 2.58 \Rightarrow n > 258^2 pq \approx 2^{14} \approx 16\,000\), where we are using \(pq = \frac14\) as the worst case possibility and \(258 \approx 256 = 2^8\)
  3. If we were looking at when we are looking at left handed people (maybe ~\(10\%\), we would be looking at \(pq = \frac{9}{100}\) so we need a smaller sample). If we are looking at millionaires (an even smaller again percentage), we would need an even smaller sample. This is surprising since you would expect you would need a larger sample to accurately gauge smaller proportions. However, this surprise can be resolved by considering that this is an absolute error. For smaller values the relative error is larger, but the absolute error is smaller.