Problems

Filters
Clear Filters

2 problems found

2002 Paper 2 Q12
D: 1600.0 B: 1500.6

On \(K\) consecutive days each of \(L\) identical coins is thrown \(M\) times. For each coin, the probability of throwing a head in any one throw is \(p\) (where \(0 < p < 1\)). Show that the probability that on exactly \(k\) of these days more than \(l\) of the coins will each produce fewer than \(m\) heads can be approximated by \[ {K \choose k}q^k(1-q)^{K-k}, \] where \[ q=\Phi\left( \frac{2h-2l-1}{2\sqrt{h} }\right), \ \ \ \ \ \ h=L\Phi\left( \frac{2m-1-2Mp}{2\sqrt{ Mp(1-p)}}\right) \] and \(\Phi(\cdot)\) is the cumulative distribution function of a standard normal variate. Would you expect this approximation to be accurate in the case \(K=7\), \(k=2\), \(L=500\), \(l=4\), \(M=100\), \(m=48\) and \(p=0.6\;\)?


Solution: Let \(H_i\) be the random variable of how many heads the \(i\)th coin throws on a given day. Then \(H_i \sim B(M,p)\), and the probability that a given coin produces fewer than \(m\) heads is \(p_h = \P(H_i < m)\) Let \(C\) be the random variable the number of coins producing fewer than \(m\) heads, then \(C \sim B(L, p_h)\). The probability that more than \(l\) of the coins produce fewer than \(m\) heads is therefore \(\P(C > l)\). Finally, the probability that on exactly \(k\) days more than \(l\) of the coins will produce fewer than \(m\) heads is: \[ \binom{K}{k} \cdot \P(C > l)^k \cdot (1-\P(C > l))^{K-k} \] Let's start by assuming that all our Binomials can be approximated by a normal distribution. \(B(M,p) \approx N(Mp, Mp(1-p))\) and so: \begin{align*} p_h &= \P(H_i < m) \\ &\approx \P( \sqrt{Mp(1-p)}Z+Mp < m-\frac12) \\ &= \P \l Z < \frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r \\ &= \Phi\l\frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r \end{align*} \(B(L, p_h) \approx B \l L, \P \l Z < \frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r\r = B(L, \frac{h}{L}) \approx N(h, \frac{h(L-h)}{L})\) Therefore \begin{align*} \P(C > l) &= 1-\P(C \leq l) \\ &\approx 1- \P \l \sqrt{\frac{h(L-h)}{L}} Z + h \leq l+\frac12 \r \\ &= 1 - \P \l Z \leq \frac{2l-2h+1}{2\sqrt{\frac{h(L-h)}{L}}}\r \\ &= 1- \Phi\l \frac{2l-2h+1}{2\sqrt{\frac{h(L-h)}{L}}} \r \\ &= \Phi\l \frac{2h-2l-1}{2\sqrt{\frac{h(L-h)}{L}}} \r \end{align*} If we can approximate \(\sqrt{1-\frac{h}{L}}\) by \(1\) then we obtain the approximation in the question. Alternatively, \(B(L, \frac{h}{L}) \approx Po(h)\) and \(Po(h) \approx N(h,h)\) so we obtain: \begin{align*} \P(C > l) &= 1-\P(C \leq l) \\ &\approx 1 - \P(\sqrt{h} Z +h < l + \frac12) \\ &= 1 - \P \l Z < \frac{2l-2h+1}{2\sqrt{h}} \r \\ &= \Phi \l \frac{2h - 2l -1}{2\sqrt{h}}\r \end{align*} as required. [I think this is what the examiners expected]. Considering the case \(K=7\), \(k=2\), \(L=500\), \(l=4\), \(M=100\), \(m=48\) and \(p=0.6\), we have the first normal approximation depends on \(Mp\) and \(M(1-p)\) being large. They are \(60\) and \(40\) respectively, so this is likely a good approximation. The first approximation finds that \begin{align*} h &= 500 \cdot \Phi \l \frac{2 \cdot 48 - 2 \cdot 60 - 1}{2\sqrt{24}} \r \\ &= 500 \cdot \Phi \l \frac{2 \cdot 48 - 2 \cdot 60 - 1}{2\sqrt{24}} \r \\ &= 500 \cdot \Phi \l \frac{-25}{2 \sqrt{24}} \r \\ &\approx 500 \cdot \Phi (-2.5) \\ &= 500 \cdot 0.0062 \\ &\approx 3.1 \end{align*} The second binomial approximation will be good if \(500 \cdot \frac{3.1}{500} = 3.1\) is large, but this is quite small. Therefore, we shouldn't expect this to be a good approximation. However, since \(m = 48\) is far from the mean (in a normalised sense), we might expect the percentage error to be large. [Alternatively, using what I expect the desired approach] The approximation of \(B(L, \frac{h}{L}) \approx Po(h)\) is acceptable since \(n>50\) and \(h < 5\). The approximation of \(Po(h) \sim N(h,h)\) is not acceptable since \(h\) is small (in particular \(h < 15\)) Finally, we can compute all these values exactly using a modern calculator. \begin{array}{l|cc} & \text{correct} & \text{approx} \\ \hline p_h & 0.005760\ldots & 0.005362\ldots \\ \P(C > l) & 0.164522\ldots & 0.133319\ldots \\ \text{ans} & 0.231389\ldots & 0.182516\ldots \end{array} We can also see how the errors propagate, by doing the calculations assuming the previous steps are correct, and also including the Poisson step. \begin{array}{lccc} & \text{correct} & \text{approx} & \text{using approx } p_h \\ \hline p_h & 0.005760\ldots & 0.005362\ldots & - \\ \P(C > l)\quad [Po(h)] & 0.164522\ldots & 0.165044\ldots & 0.134293\ldots \\ \P(C > l)\quad [N(h,h)] & 0.164522\ldots & 0.169953\ldots & 0.133319\ldots \\ \P(C > l)\quad [N(h,h(1-\frac{h}{L})] & 0.164522\ldots & 0.169255\ldots & 0.132677\ldots \\ \text{ans} & 0.231389\ldots & 0.231389\ldots \end{array} By doing this, we discover that the largest errors are actually coming not from approximating the second approximation but from the small absolute (but large relative error) in the first approximation. This is, in fact, a coincidence; we can observe it by investigating the specific values being used. The first approximation looks as follows:

TikZ diagram
You might not be able to tell, but there's actually two plots on this chart. However, let's zoom in on the area we are worried about:
TikZ diagram
We can see there are small differences, which could be large in percentage terms. (As we found when we computed them directly).
TikZ diagram
First, we can immediately see that if we just look at the distribution of \(B(L, p_h)\) and \(B(L, p_{h_\text{approx}})\) we get quite different results, even before we do any approximations.
TikZ diagram
If we plot the probability distribution of \(B(L, p_h)\) vs \(N(Lp_h, Lp_h(1-p_h))\) we find that it is not a great approximation.
TikZ diagram
However, the CDF happens to be a very good approximation *just* for the value we care about. Very lucky, but not possible for someone sitting STEP to know at the time!

1995 Paper 1 Q14
D: 1516.0 B: 1531.3

  1. Find the maximum value of \(\sqrt{p(1-p)}\) as \(p\) varies between \(0\) and \(1\).
  2. Suppose that a proportion \(p\) of the population is female. In order to estimate \(p\) we pick a sample of \(n\) people at random and find the proportion of them who are female. Find the value of \(n\) which ensures that the chance of our estimate of \(p\) being more than \(0.01\) in error is less than 1\%.
  3. Discuss how the required value of \(n\) would be affected if (a) \(p\) were the proportion of people in the population who are left-handed; (b) \(p\) were the proportion of people in the population who are millionaires.


Solution:

  1. \(\,\) \begin{align*} && \sqrt{p(1-p)} &= \sqrt{p-p^2} \\ &&&= \sqrt{\tfrac14-(\tfrac12-p)^2} \\ &&&\leq \sqrt{\tfrac14} = \tfrac12 \end{align*} Therefore the maximum is \(\tfrac12\) when \(p=\frac12\)
  2. Notice that our estimate \(\hat{p}\) will (for large \(n\)) be follow a normal distribution \(N(p, pq/n)\) by either the normal approximation to the binomial or central limit theorem. We would like \(0.01 > \mathbb{P}\left ( |\hat{p}-p| < 0.01 \right)\) or in other words \begin{align*} && 0.01 &> \mathbb{P}\left ( |\hat{p}-p| > 0.01 \right) \\ &&&=\mathbb{P}\left ( |\sqrt{\frac{pq}{n}}Z+p-p| > 0.01 \right) \\ &&&= \mathbb{P} \left (|Z|>\frac{0.01\sqrt{n}}{\sqrt{pq}}\right) \end{align*} therefore we need \(\frac{0.01\sqrt{n}}{\sqrt{pq}}> 2.58 \Rightarrow n > 258^2 pq \approx 2^{14} \approx 16\,000\), where we are using \(pq = \frac14\) as the worst case possibility and \(258 \approx 256 = 2^8\)
  3. If we were looking at when we are looking at left handed people (maybe ~\(10\%\), we would be looking at \(pq = \frac{9}{100}\) so we need a smaller sample). If we are looking at millionaires (an even smaller again percentage), we would need an even smaller sample. This is surprising since you would expect you would need a larger sample to accurately gauge smaller proportions. However, this surprise can be resolved by considering that this is an absolute error. For smaller values the relative error is larger, but the absolute error is smaller.