Approximating Binomial to Normal Distribution

Showing 1-7 of 7 problems
2006 Paper 3 Q12
D: 1700.0 B: 1500.0

Fifty times a year, 1024 tourists disembark from a cruise liner at a port. From there they must travel to the city centre either by bus or by taxi. Tourists are equally likely to be directed to the bus station or to the taxi rank. Each bus of the bus company holds 32 passengers, and the company currently runs 15 buses. The company makes a profit of \(\pounds\)1 for each passenger carried. It carries as many passengers as it can, with any excess being (eventually) transported by taxi. Show that the largest annual licence fee, in pounds, that the company should consider paying to be allowed to run an extra bus is approximately \[ 1600 \Phi(2) - \frac{800}{\sqrt{2\pi}}\big(1- \e^{-2}\big)\,, \] where \(\displaystyle \Phi(x) =\dfrac1{\sqrt{2\pi}} \int_{-\infty}^x \e^{-\frac12t^2}\d t\,\). You should not consider continuity corrections.

Show Solution
The the number of people being directed towards the buses (each cruise) is \(X \sim B(1024, \tfrac12) \approx N(512, 256) \approx 16Z + 512\). Therefore without an extra bus, the expected profit is \(\mathbb{E}[\min(X, 15 \times 32)]\). With the extra bus, the extra profit is \(\mathbb{E}[\min(X, 16 \times 32)]\), therefore the expected extra profit is: \(\mathbb{E}[\min(X, 16 \times 32)]-\mathbb{E}[\min(X, 15 \times 32)] = \mathbb{E}[\min(X, 16 \times 32)-\min(X, 15 \times 32)] \) \begin{align*} \text{Expected extra profit} &= \mathbb{E}[\min(X, 16 \times 32)-\min(X, 15 \times 32)] \\ &= \mathbb{E}[\min(16Z+512, 16 \times 32)-\min(16Z+512, 15 \times 32)] \\ &= 16\mathbb{E}[\min(Z+32, 32)-\min(Z+32, 30)] \\ &=16\int_{-\infty}^{\infty} \left (\min(Z+32, 32)-\min(Z+32, 30) \right)p_Z(z) \d z \\ &= 16 \left ( \int_{-2}^{0} (z+32-30) p_Z(z) \d z + \int_0^\infty (32-30)p_Z(z) \d z \right) \\ &= 16 \left ( \int_{-2}^{0} (z+2) p_Z(z) \d z + \int_0^\infty 2p_Z(z) \d z \right) \\ &= 16 \left ( \int_{-2}^{0} zp_Z(z) \d z + 2\int_{-2}^\infty p_Z(z) \d z \right) \\ &= 16 \left ( \int_{-2}^{0} z \frac{1}{\sqrt{2\pi}} e^{-\frac12 z^2} \d z + 2(1-\Phi(2)) \right) \\ &= 32(1-\Phi(2)) + \frac{16}{\sqrt{2\pi}} \left [ -e^{-\frac12z^2} \right]_{-2}^0 \\ &= 32(1-\Phi(2)) - \frac{16}{\sqrt{2\pi}} \left ( 1-e^{-2}\right) \end{align*} Across \(50\) different runs, this profit is \[ 1600(1-\Phi(2)) - \frac{800}{\sqrt{2\pi}} \left ( 1-e^{-2}\right) \]
2002 Paper 2 Q12
D: 1600.0 B: 1500.6

On \(K\) consecutive days each of \(L\) identical coins is thrown \(M\) times. For each coin, the probability of throwing a head in any one throw is \(p\) (where \(0 < p < 1\)). Show that the probability that on exactly \(k\) of these days more than \(l\) of the coins will each produce fewer than \(m\) heads can be approximated by \[ {K \choose k}q^k(1-q)^{K-k}, \] where \[ q=\Phi\left( \frac{2h-2l-1}{2\sqrt{h} }\right), \ \ \ \ \ \ h=L\Phi\left( \frac{2m-1-2Mp}{2\sqrt{ Mp(1-p)}}\right) \] and \(\Phi(\cdot)\) is the cumulative distribution function of a standard normal variate. Would you expect this approximation to be accurate in the case \(K=7\), \(k=2\), \(L=500\), \(l=4\), \(M=100\), \(m=48\) and \(p=0.6\;\)?

Show Solution
Let \(H_i\) be the random variable of how many heads the \(i\)th coin throws on a given day. Then \(H_i \sim B(M,p)\), and the probability that a given coin produces fewer than \(m\) heads is \(p_h = \P(H_i < m)\) Let \(C\) be the random variable the number of coins producing fewer than \(m\) heads, then \(C \sim B(L, p_h)\). The probability that more than \(l\) of the coins produce fewer than \(m\) heads is therefore \(\P(C > l)\). Finally, the probability that on exactly \(k\) days more than \(l\) of the coins will produce fewer than \(m\) heads is: \[ \binom{K}{k} \cdot \P(C > l)^k \cdot (1-\P(C > l))^{K-k} \] Let's start by assuming that all our Binomials can be approximated by a normal distribution. \(B(M,p) \approx N(Mp, Mp(1-p))\) and so: \begin{align*} p_h &= \P(H_i < m) \\ &\approx \P( \sqrt{Mp(1-p)}Z+Mp < m-\frac12) \\ &= \P \l Z < \frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r \\ &= \Phi\l\frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r \end{align*} \(B(L, p_h) \approx B \l L, \P \l Z < \frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r\r = B(L, \frac{h}{L}) \approx N(h, \frac{h(L-h)}{L})\) Therefore \begin{align*} \P(C > l) &= 1-\P(C \leq l) \\ &\approx 1- \P \l \sqrt{\frac{h(L-h)}{L}} Z + h \leq l+\frac12 \r \\ &= 1 - \P \l Z \leq \frac{2l-2h+1}{2\sqrt{\frac{h(L-h)}{L}}}\r \\ &= 1- \Phi\l \frac{2l-2h+1}{2\sqrt{\frac{h(L-h)}{L}}} \r \\ &= \Phi\l \frac{2h-2l-1}{2\sqrt{\frac{h(L-h)}{L}}} \r \end{align*} If we can approximate \(\sqrt{1-\frac{h}{L}}\) by \(1\) then we obtain the approximation in the question. Alternatively, \(B(L, \frac{h}{L}) \approx Po(h)\) and \(Po(h) \approx N(h,h)\) so we obtain: \begin{align*} \P(C > l) &= 1-\P(C \leq l) \\ &\approx 1 - \P(\sqrt{h} Z +h < l + \frac12) \\ &= 1 - \P \l Z < \frac{2l-2h+1}{2\sqrt{h}} \r \\ &= \Phi \l \frac{2h - 2l -1}{2\sqrt{h}}\r \end{align*} as required. [I think this is what the examiners expected]. Considering the case \(K=7\), \(k=2\), \(L=500\), \(l=4\), \(M=100\), \(m=48\) and \(p=0.6\), we have the first normal approximation depends on \(Mp\) and \(M(1-p)\) being large. They are \(60\) and \(40\) respectively, so this is likely a good approximation. The first approximation finds that \begin{align*} h &= 500 \cdot \Phi \l \frac{2 \cdot 48 - 2 \cdot 60 - 1}{2\sqrt{24}} \r \\ &= 500 \cdot \Phi \l \frac{2 \cdot 48 - 2 \cdot 60 - 1}{2\sqrt{24}} \r \\ &= 500 \cdot \Phi \l \frac{-25}{2 \sqrt{24}} \r \\ &\approx 500 \cdot \Phi (-2.5) \\ &= 500 \cdot 0.0062 \\ &\approx 3.1 \end{align*} The second binomial approximation will be good if \(500 \cdot \frac{3.1}{500} = 3.1\) is large, but this is quite small. Therefore, we shouldn't expect this to be a good approximation. However, since \(m = 48\) is far from the mean (in a normalised sense), we might expect the percentage error to be large. [Alternatively, using what I expect the desired approach] The approximation of \(B(L, \frac{h}{L}) \approx Po(h)\) is acceptable since \(n>50\) and \(h < 5\). The approximation of \(Po(h) \sim N(h,h)\) is not acceptable since \(h\) is small (in particular \(h < 15\)) Finally, we can compute all these values exactly using a modern calculator. \begin{array}{l|cc} & \text{correct} & \text{approx} \\ \hline p_h & 0.005760\ldots & 0.005362\ldots \\ \P(C > l) & 0.164522\ldots & 0.133319\ldots \\ \text{ans} & 0.231389\ldots & 0.182516\ldots \end{array} We can also see how the errors propagate, by doing the calculations assuming the previous steps are correct, and also including the Poisson step. \begin{array}{lccc} & \text{correct} & \text{approx} & \text{using approx } p_h \\ \hline p_h & 0.005760\ldots & 0.005362\ldots & - \\ \P(C > l)\quad [Po(h)] & 0.164522\ldots & 0.165044\ldots & 0.134293\ldots \\ \P(C > l)\quad [N(h,h)] & 0.164522\ldots & 0.169953\ldots & 0.133319\ldots \\ \P(C > l)\quad [N(h,h(1-\frac{h}{L})] & 0.164522\ldots & 0.169255\ldots & 0.132677\ldots \\ \text{ans} & 0.231389\ldots & 0.231389\ldots \end{array} By doing this, we discover that the largest errors are actually coming not from approximating the second approximation but from the small absolute (but large relative error) in the first approximation. This is, in fact, a coincidence; we can observe it by investigating the specific values being used. The first approximation looks as follows:
TikZ diagram
You might not be able to tell, but there's actually two plots on this chart. However, let's zoom in on the area we are worried about:
TikZ diagram
We can see there are small differences, which could be large in percentage terms. (As we found when we computed them directly).
TikZ diagram
First, we can immediately see that if we just look at the distribution of \(B(L, p_h)\) and \(B(L, p_{h_\text{approx}})\) we get quite different results, even before we do any approximations.
TikZ diagram
If we plot the probability distribution of \(B(L, p_h)\) vs \(N(Lp_h, Lp_h(1-p_h))\) we find that it is not a great approximation.
TikZ diagram
However, the CDF happens to be a very good approximation *just* for the value we care about. Very lucky, but not possible for someone sitting STEP to know at the time!
1995 Paper 1 Q14
D: 1516.0 B: 1531.3

  1. Find the maximum value of \(\sqrt{p(1-p)}\) as \(p\) varies between \(0\) and \(1\).
  2. Suppose that a proportion \(p\) of the population is female. In order to estimate \(p\) we pick a sample of \(n\) people at random and find the proportion of them who are female. Find the value of \(n\) which ensures that the chance of our estimate of \(p\) being more than \(0.01\) in error is less than 1\%.
  3. Discuss how the required value of \(n\) would be affected if (a) \(p\) were the proportion of people in the population who are left-handed; (b) \(p\) were the proportion of people in the population who are millionaires.

Show Solution
  1. \(\,\) \begin{align*} && \sqrt{p(1-p)} &= \sqrt{p-p^2} \\ &&&= \sqrt{\tfrac14-(\tfrac12-p)^2} \\ &&&\leq \sqrt{\tfrac14} = \tfrac12 \end{align*} Therefore the maximum is \(\tfrac12\) when \(p=\frac12\)
  2. Notice that our estimate \(\hat{p}\) will (for large \(n\)) be follow a normal distribution \(N(p, pq/n)\) by either the normal approximation to the binomial or central limit theorem. We would like \(0.01 > \mathbb{P}\left ( |\hat{p}-p| < 0.01 \right)\) or in other words \begin{align*} && 0.01 &> \mathbb{P}\left ( |\hat{p}-p| > 0.01 \right) \\ &&&=\mathbb{P}\left ( |\sqrt{\frac{pq}{n}}Z+p-p| > 0.01 \right) \\ &&&= \mathbb{P} \left (|Z|>\frac{0.01\sqrt{n}}{\sqrt{pq}}\right) \end{align*} therefore we need \(\frac{0.01\sqrt{n}}{\sqrt{pq}}> 2.58 \Rightarrow n > 258^2 pq \approx 2^{14} \approx 16\,000\), where we are using \(pq = \frac14\) as the worst case possibility and \(258 \approx 256 = 2^8\)
  3. If we were looking at when we are looking at left handed people (maybe ~\(10\%\), we would be looking at \(pq = \frac{9}{100}\) so we need a smaller sample). If we are looking at millionaires (an even smaller again percentage), we would need an even smaller sample. This is surprising since you would expect you would need a larger sample to accurately gauge smaller proportions. However, this surprise can be resolved by considering that this is an absolute error. For smaller values the relative error is larger, but the absolute error is smaller.
1991 Paper 2 Q16
D: 1600.0 B: 1516.0

Each time it rains over the Cabbibo dam, a volume \(V\) of water is deposited, almost instanetaneously, in the reservoir. Each day (midnight to midnight) water flows from the reservoir at a constant rate \(u\) units of volume per day. An engineer, if present, may choose to alter the value of \(u\) at any midnight.

  1. Suppose that it rains at most once in any day, that there is a probability \(p\) that it will rain on any given day and that, if it does, the rain is equally likely to fall at any time in the 24 hours (i.e. the time at which the rain falls is a random variable uniform on the interval \([0,24]\)). The engineers decides to take two days' holiday starting at midnight. If at this time the volume of water in the reservoir is \(V\) below the top of the dam, find an expression for \(u\) such that the probability of overflow in the two days is \(Q\), where \(Q < p^{2}.\)
  2. For the engineer's summer holidays, which last 18 days, the reservoir is drained to a volume \(kV\) below the top of the dam and the rate of outflow \(u\) is set to zero. The engineer wants to drain off as little as possible, consistent with the requirement that the probability that the dam will overflow is less than \(\frac{1}{10}.\) In the case \(p=\frac{1}{3},\) find by means of a suitable approximation the required value of \(k\).
  3. Suppose instead that it may rain at most once before noon and at most once after noon each day, that the probability of rain in any given half-day is \(\frac{1}{6}\) and that it is equally likely to rain at any time in each half-day. Is the required value of \(k\) lower or higher?

Show Solution
  1. It cannot overflow on the first day, since it is already \(V\) below the top. The only way it can overflow is if it rains both days. This will occur with probability \(p^2\). The probability it overflows therefore is the probability that bad timing hampers us, ie \(V - u(1+t_2) > 0\) where \(t_2\) is the timing of the rain on day 2 (as a fraction of a day). Ie \(t_2 < \frac{V}{u}-1\). Therefore \begin{align*} && Q &= p^2 \left (\frac{V}{u} - 1 \right) \\ \Rightarrow && u &= \frac{Vp^2}{p^2+Q} \end{align*}
  2. The probability the reservoir overflows during this \(18\) days is \(\mathbb{P}(\text{rains more than }k\text{ times})\). The number of times it rains (\(X\)) is \(B(18, \tfrac13)\), since \(18 \cdot \tfrac13 = 6 > 5\) a normal approximation is reasonable, ie \(X \approx N(6, 4)\). We wish to find \(k\) such that \(\mathbb{P}( X > k + 0.5) < \tfrac1{10}\) therefore \(k \approx 1.28 \cdot 2 + 6 - 0.5 \approx 8.1\) so they should set \(k\) to \(9\)
  3. In this case we have \(B(36, \tfrac16)\) approximated by \(B(6, 5)\) which has a larger standard deviation, therefore we need to choose a larger value for \(k\). [It turns out to actually be the same, but there's no reason to be able to expect students without a calculator to establish this]
1989 Paper 3 Q16
D: 1700.0 B: 1484.0

It is believed that the population of Ruritania can be described as follows:

  1. \(25\%\) are fair-haired and the rest are dark-haired;
  2. \(20\%\) are green-eyed and the rest hazel-eyed;
  3. the population can also be divided into narrow-headed and broad-headed;
  4. no narrow-headed person has green eyes and fair hair;
  5. those who are green-eyed are as likely to be narrow-headed as broad-headed;
  6. those who are green-eyed and broad-headed are as likely to be fair-headed as dark-haired;
  7. half of the population is broad-headed and dark-haired;
  8. a hazel-eyed person is as likely to be fair-haired and broad-headed as dark-haired and narrow-headed.
Find the proportion believed to be narrow-headed. I am acquainted with only six Ruritanians, all of whom are broad-headed. Comment on this observation as evidence for or against the given model. A random sample of 200 Ruritanians is taken and is found to contain 50 narrow-heads. On the basis of the given model, calculate (to a reasonable approximation) the probability of getting 50 or fewer narrow-heads. Comment on the result.

Show Solution
TikZ diagram
Conditions tell us: \begin{align*} && a+b+d+e &= 0.25 \\ && b+c+e+f &= 0.2 \\ && e &= 0 \\ && b+c &= e + f \\ && b &= c \\ && c+h &= 0.5 \\ && a &= g \\ \end{align*}
TikZ diagram
So \(4b = 0.2 \Rightarrow b = 0.05\)
TikZ diagram
And \begin{align*} && 0.25 &= a + d + 0.05 \\ && 1 &= 2a + d + 0.65 \\ \Rightarrow && a &= 0.15 \\ && d &= 0.05 \end{align*}
TikZ diagram
So the proportion who are narrow-headed is \(30\%\). It's obviously relatively unlikely for your six Ruritanian friends to all be broad-headed if it's a random sample, but friendship groups are are likely to be biased so it's not too surprising. Assuming there is a sufficiently large number of Ruritanians, we might model the number of narrow-headed Ruritanians from a sample of \(200\) as \(X \sim B(200, 0.3)\). Computing \(\mathbb{P}(X \leq 50)\) by hand is tricky, so let's use a binomial approximation to obtain: \(X \approx N(60, 42)\) and \begin{align*} \mathbb{P}(X \leq 50) &\approx \mathbb{P} \left (Z \leq \frac{50 - 60+0.5}{\sqrt{42}} \right) \\ &\approx \mathbb{P} \left (Z \leq -\frac{9.5}{6.5} \right) \\ &\approx \mathbb{P} \left (Z \leq -\frac{3}{2} \right) \\ &\approx 5\% \end{align*} (actually this approximation gives \(7.1\%\) and the binomial value gives \(7.0\%\)). This also seems somewhat surprising
1988 Paper 1 Q14
D: 1500.0 B: 1529.3

Let \(X\) be a standard normal random variable. If \(M\) is any real number, the random variable \(X_{M}\) is defined in terms of \(X\) by \[ X_{M}=\begin{cases} X & \mbox{if }X < M,\\ M & \mbox{if }X\geqslant M. \end{cases} \] Show that the expectation of \(X_{M}\) is given by \[ \mathrm{E}(X_{M})=-\phi(M)+M(1-\Phi(M)), \] where \(\phi\) is the probability density function, and \(\Phi\) is the cumulative distribution function of \(X\). Fifty times a year, 1024 tourists disembark from a cruise liner at the port of Slaka. From there they must travel to the capital either by taxi or by bus. Officials of HOGPo are equally likely to direct a tourist to the bus station or to the taxi rank. Each bus of the bus coorperative holds 31 passengers, and the coorperative currently runs 16 buses. The bus coorperative makes a profit of 1 vloska for each passenger carried. It carries all the passengers it can, with any excess being (eventually) transported by taxi. What is the largest annual bribe the bus coorperative should consider paying to HOGPo in order to be allowed to run an extra bus?

Show Solution
Let \(X \sim N(0,1)\), and $\displaystyle X_{M}=\begin{cases} X & \text{if }X < M,\\ M & \text{if }X\geqslant M. \end{cases} $. Then we can calculate: \begin{align*} \mathbb{E}[X_M] &= \int_{-\infty}^M xf_X(x)\,dx + M\mathbb{P}(X \geq M) \\ &= \int_{-\infty}^M x \frac1{\sqrt{2\pi}}e^{-\frac12x^2}\,dx + M\mathbb{P}(X \geq M) \\ &= \left [ -\frac{1}{\sqrt{2\pi}}e^{-\frac12x^2} \right ]_{-\infty}^M + M (1-\mathbb{P}(X < M)) \\ &= -\phi(M) + M(1-\Phi(M)) \end{align*} Let \(B \sim B\left (1024, \frac12 \right)\) be the number of potential bus passengers. Then \(B \approx N(512, 256) = N(512, 16^2)\) which is a good approximation since both \(np\) and \(nq\) are large. The question is asking us, how much additional profit would the bus company get if they ran an additional bus. Currently each week they is (on average) \(512\) passengers worth of demand, but they can only supply \(496\) seats, so we should expect that there is demand for another bus. The question is how much that demand is worth. Using the first part of the question, we can see that their profit is something like a `capped normal', \(X_M\), except we are scaled and with a different cap. So we are interested in $\displaystyle Y_{M}=\begin{cases} B & \mbox{if }B< M,\\ M & \mbox{if }B\geqslant M. \end{cases}\(, but since \)B \approx N\left (512,16^2\right)$ this is similar to \begin{align*} Y_{M}&=\begin{cases} 16X+512 & \mbox{if }16X+512< M,\\ M & \mbox{if }16X+512\geq M. \end{cases} \\ &= \begin{cases} 16X+512 & \mbox{if }X< \frac{M-512}{16},\\ M & \mbox{if }X \geq \frac{M-512}{16}. \end{cases} \\ &= 16X_{\frac{M-512}{16}} + 512\end{align*} We are interested in \(\mathbb{E}[Y_{16\times31}]\) and \(\mathbb{E}[Y_{17\times31}]\), which are \(16\mathbb{E}[X_{-1}]+512\) and \(16\mathbb{E}[Y_{\frac{15}{16}}]+512\) Since \(\frac{15}{16} \approx 1\), lets look at \(16(\mathbb{E}[X_1] - \mathbb{E}[X_{-1}])\) \begin{align*} \mathbb{E}[X_1] - \mathbb{E}[X_{-1}] &= \left ( -\phi(1) + 1-\Phi(1)\right) - \left ( - \phi(-1) -(1 - \Phi(-1)) \right ) \\ &= -\phi(1) + \phi(-1) + 1-\Phi(1) + 1 - \Phi(-1) \\ &= 1 - \Phi(1) + \Phi(1) \\ &= 1 \end{align*} Therefore the extra \(31\) will fill roughly \(16\) of them. (This is a slight overestimate, which is worth bearing in mind). A better approximation might be that \(\mathbb{E}[X_t] - \mathbb{E}[X_{-1}] = \frac{t +1}{2}\) for \(t \approx 1\), (since we want something increasing). This would give us an approximation of \(15.5\), which is very close to the `true' answer. Therefore, over \(50\) bus runs, we should earn roughly \(800\) vloska extra from an additional bus. (Again an overestimate, and with an uncertain pay-off, they should consider offering maybe \(600\)). Since this is the future, we can quite easily calculate the exact values using the binomial distribution on a computer. This gives the true value as \(15.833\), and so they should pay up to \(791\)
1987 Paper 1 Q16
D: 1500.0 B: 1500.0

The parliament of Laputa consists of 60 Preservatives and 40 Progressives. Preservatives never change their mind, always voting the same way on any given issue. Progressives vote at random on any given issue.

  1. A randomly selected member is known to have voted the same way twice on a given issue. Find the probability that the member will vote the same way a third time on that issue.
  2. Following a policy change, a proportion \(\alpha\) of Preservatives now consistently votes against Preservative policy. The Preservative leader decides that an election must be called if the value of \(\alpha\) is such that, at any vote on an item of Preservative policy, the chance of a simple majority would be less than 80\%. By making a suitable normal approximation, estimate the least value of \(\alpha\) which will result in an election being called.

Show Solution
  1. The vote is will now be \(60(1-\alpha)\) for, \(60\alpha\) against and \(X \sim B(40, \frac12)\) at random between those. For a majority, they need \(60(1-\alpha) + X > 50\), ie \(\P(X > 60\alpha - 10) \geq 0.8\). Using a normal approximation to the binomial, we need \(X \approx N(20, 10)\), so \begin{align*} \P(X > 60 \alpha - 10) &= 1- \P(X \leq 60 \alpha - 10) \\ &\approx 1 - \P(\sqrt{10}Z+20 \leq 60\alpha - 10.5) \\ &\approx 1 - \P(Z \leq \frac{60\alpha - 30.5}{\sqrt{10}}) \end{align*} If we want this to be less than \(0.2\) we need \( \frac{60\alpha - 30.5}{\sqrt{10}} < -0.8416 \Rightarrow \alpha < 0.4639\). This would correspond to 27 or fewer exiles or 33 or more remaining preservatives. [Actual computations using Binomial distribution shows we should expect at least 17 to randomly join 20% of the time, so 34 preservatives are required]