Course Problems
Home
Problems
Assign Problems
Organize
Assign Problems
Add Problems
Solution Progress
TikZ Images
Compare
Difficulty
Banger Rating
PDF Management
Ctrl+S
Edit Problem
Year
Paper
Question Number
Course
-- Select Course --
LFM Pure
LFM Pure and Mechanics
LFM Stats And Pure
UFM Additional Further Pure
UFM Mechanics
UFM Pure
UFM Statistics
zNo longer examinable
Section
-- Select Section --
Coordinate Geometry
Simultaneous equations
Proof
Proof by induction
Introduction to trig
Modulus function
Matrices
Linear transformations
Invariant lines and eigenvalues and vectors
Trigonometry 2
Small angle approximation
Differentiation
Integration
Implicit equations and differentiation
Differential equations
3x3 Matrices
Exponentials and Logarithms
Arithmetic and Geometric sequences
Differentiation from first principles
Integration as Area
Vectors
Constant Acceleration
Non-constant acceleration
Newton's laws and connected particles
Pulley systems
Motion on a slope
Friction
Momentum and Collisions
Moments
Parametric equations
Projectiles
Quadratics & Inequalities
Curve Sketching
Polynomials
Binomial Theorem (positive integer n)
Functions (Transformations and Inverses)
Partial Fractions
Generalised Binomial Theorem
Complex Numbers (L8th)
Combinatorics
Measures of Location and Spread
Probability Definitions
Tree Diagrams
Principle of Inclusion/Exclusion
Independent Events
Conditional Probability
Discrete Probability Distributions
Uniform Distribution
Binomial Distribution
Geometric Distribution
Hypergeometric Distribution
Negative Binomial Distribution
Modelling and Hypothesis Testing
Hypothesis test of binomial distributions
Data representation
Continuous Probability Distributions and Random Variables
Continuous Uniform Random Variables
Geometric Probability
Normal Distribution
Approximating Binomial to Normal Distribution
Solving equations numerically
Newton-Raphson method
Sequences and Series
Number Theory
Vector Product and Surfaces
Groups
Reduction Formulae
Moments
Work, energy and Power 1
Momentum and Collisions 1
Centre of Mass 1
Circular Motion 1
Momentum and Collisions 2
Work, energy and Power 2
Centre of Mass 2
Circular Motion 2
Dimensional Analysis
Variable Force
Simple Harmonic Motion
Sequences and series, recurrence and convergence
Roots of polynomials
Polar coordinates
Conic sections
Taylor series
Hyperbolic functions
Integration using inverse trig and hyperbolic functions
Vectors
First order differential equations (integrating factor)
Complex numbers 2
Second order differential equations
Discrete Random Variables
Poisson Distribution
Approximating the Poisson to the Normal distribution
Approximating the Binomial to the Poisson distribution
Probability Generating Functions
Cumulative distribution functions
Exponential Distribution
Bivariate data
Linear regression
Moment generating functions
Linear combinations of normal random variables
Central limit theorem
Hypothesis test of a normal distribution
Hypothesis test of Pearson’s product-moment correlation coefficient
Hypothesis test of Spearman’s rank correlation coefficien
Hypothesis test of a Poisson distribution
The Gamma Distribution
Chi-squared distribution
Yates’ continuity correction
Non-parametric tests
Wilcoxon tests
Moments of inertia
Worksheet Citation (for copying)
Click the copy button or select the text to copy this citation for use in worksheets.
Problem Text
On $K$ consecutive days each of $L$ identical coins is thrown $M$ times. For each coin, the probability of throwing a head in any one throw is $p$ (where $0 < p < 1$). Show that the probability that on exactly $k$ of these days more than $l$ of the coins will each produce fewer than $m$ heads can be approximated by \[ {K \choose k}q^k(1-q)^{K-k}, \] where \[ q=\Phi\left( \frac{2h-2l-1}{2\sqrt{h} }\right), \ \ \ \ \ \ h=L\Phi\left( \frac{2m-1-2Mp}{2\sqrt{ Mp(1-p)}}\right) \] and $\Phi(\cdot)$ is the cumulative distribution function of a standard normal variate. Would you expect this approximation to be accurate in the case $K=7$, $k=2$, $L=500$, $l=4$, $M=100$, $m=48$ and $p=0.6\;$?
Solution (Optional)
Let $H_i$ be the random variable of how many heads the $i$th coin throws on a given day. Then $H_i \sim B(M,p)$, and the probability that a given coin produces fewer than $m$ heads is $p_h = \P(H_i < m)$ Let $C$ be the random variable the number of coins producing fewer than $m$ heads, then $C \sim B(L, p_h)$. The probability that more than $l$ of the coins produce fewer than $m$ heads is therefore $\P(C > l)$. Finally, the probability that on exactly $k$ days more than $l$ of the coins will produce fewer than $m$ heads is: \[ \binom{K}{k} \cdot \P(C > l)^k \cdot (1-\P(C > l))^{K-k} \] Let's start by assuming that all our Binomials can be approximated by a normal distribution. $B(M,p) \approx N(Mp, Mp(1-p))$ and so: \begin{align*} p_h &= \P(H_i < m) \\ &\approx \P( \sqrt{Mp(1-p)}Z+Mp < m-\frac12) \\ &= \P \l Z < \frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r \\ &= \Phi\l\frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r \end{align*} $B(L, p_h) \approx B \l L, \P \l Z < \frac{2m-2Mp-1}{2\sqrt{Mp(1-p)}} \r\r = B(L, \frac{h}{L}) \approx N(h, \frac{h(L-h)}{L})$ Therefore \begin{align*} \P(C > l) &= 1-\P(C \leq l) \\ &\approx 1- \P \l \sqrt{\frac{h(L-h)}{L}} Z + h \leq l+\frac12 \r \\ &= 1 - \P \l Z \leq \frac{2l-2h+1}{2\sqrt{\frac{h(L-h)}{L}}}\r \\ &= 1- \Phi\l \frac{2l-2h+1}{2\sqrt{\frac{h(L-h)}{L}}} \r \\ &= \Phi\l \frac{2h-2l-1}{2\sqrt{\frac{h(L-h)}{L}}} \r \end{align*} If we can approximate $\sqrt{1-\frac{h}{L}}$ by $1$ then we obtain the approximation in the question. Alternatively, $B(L, \frac{h}{L}) \approx Po(h)$ and $Po(h) \approx N(h,h)$ so we obtain: \begin{align*} \P(C > l) &= 1-\P(C \leq l) \\ &\approx 1 - \P(\sqrt{h} Z +h < l + \frac12) \\ &= 1 - \P \l Z < \frac{2l-2h+1}{2\sqrt{h}} \r \\ &= \Phi \l \frac{2h - 2l -1}{2\sqrt{h}}\r \end{align*} as required. [I think this is what the examiners expected]. Considering the case $K=7$, $k=2$, $L=500$, $l=4$, $M=100$, $m=48$ and $p=0.6$, we have the first normal approximation depends on $Mp$ and $M(1-p)$ being large. They are $60$ and $40$ respectively, so this is likely a good approximation. The first approximation finds that \begin{align*} h &= 500 \cdot \Phi \l \frac{2 \cdot 48 - 2 \cdot 60 - 1}{2\sqrt{24}} \r \\ &= 500 \cdot \Phi \l \frac{2 \cdot 48 - 2 \cdot 60 - 1}{2\sqrt{24}} \r \\ &= 500 \cdot \Phi \l \frac{-25}{2 \sqrt{24}} \r \\ &\approx 500 \cdot \Phi (-2.5) \\ &= 500 \cdot 0.0062 \\ &\approx 3.1 \end{align*} The second binomial approximation will be good if $500 \cdot \frac{3.1}{500} = 3.1$ is large, but this is quite small. Therefore, we shouldn't expect this to be a good approximation. However, since $m = 48$ is far from the mean (in a normalised sense), we might expect the percentage error to be large. [Alternatively, using what I expect the desired approach] The approximation of $B(L, \frac{h}{L}) \approx Po(h)$ is acceptable since $n>50$ and $h < 5$. The approximation of $Po(h) \sim N(h,h)$ is not acceptable since $h$ is small (in particular $h < 15$) Finally, we can compute all these values exactly using a modern calculator. \begin{array}{l|cc} & \text{correct} & \text{approx} \\ \hline p_h & 0.005760\ldots & 0.005362\ldots \\ \P(C > l) & 0.164522\ldots & 0.133319\ldots \\ \text{ans} & 0.231389\ldots & 0.182516\ldots \end{array} We can also see how the errors propagate, by doing the calculations assuming the previous steps are correct, and also including the Poisson step. \begin{array}{lccc} & \text{correct} & \text{approx} & \text{using approx } p_h \\ \hline p_h & 0.005760\ldots & 0.005362\ldots & - \\ \P(C > l)\quad [Po(h)] & 0.164522\ldots & 0.165044\ldots & 0.134293\ldots \\ \P(C > l)\quad [N(h,h)] & 0.164522\ldots & 0.169953\ldots & 0.133319\ldots \\ \P(C > l)\quad [N(h,h(1-\frac{h}{L})] & 0.164522\ldots & 0.169255\ldots & 0.132677\ldots \\ \text{ans} & 0.231389\ldots & 0.231389\ldots \end{array} By doing this, we discover that the largest errors are actually coming not from approximating the second approximation but from the small absolute (but large relative error) in the first approximation. This is, in fact, a coincidence; we can observe it by investigating the specific values being used. The first approximation looks as follows: \begin{center} \begin{tikzpicture}[scale=4] \draw[->] (0,0) -- (0,1.2); \draw[->] (0,0) -- (2,0); \filldraw[red] (0.0000, 0.0104) circle (0.3pt); \filldraw[blue] (0.0000, 0.0095) circle (0.3pt); \filldraw[red] (0.0667, 0.0186) circle (0.3pt); \filldraw[blue] (0.0667, 0.0174) circle (0.3pt); \filldraw[red] (0.1333, 0.0320) circle (0.3pt); \filldraw[blue] (0.1333, 0.0304) circle (0.3pt); \filldraw[red] (0.2000, 0.0531) circle (0.3pt); \filldraw[blue] (0.2000, 0.0511) circle (0.3pt); \filldraw[red] (0.2667, 0.0845) circle (0.3pt); \filldraw[blue] (0.2667, 0.0824) circle (0.3pt); \filldraw[red] (0.3333, 0.1292) circle (0.3pt); \filldraw[blue] (0.3333, 0.1274) circle (0.3pt); \filldraw[red] (0.4000, 0.1900) circle (0.3pt); \filldraw[blue] (0.4000, 0.1891) circle (0.3pt); \filldraw[red] (0.4667, 0.2686) circle (0.3pt); \filldraw[blue] (0.4667, 0.2691) circle (0.3pt); \filldraw[red] (0.5333, 0.3649) circle (0.3pt); \filldraw[blue] (0.5333, 0.3674) circle (0.3pt); \filldraw[red] (0.6000, 0.4764) circle (0.3pt); \filldraw[blue] (0.6000, 0.4812) circle (0.3pt); \filldraw[red] (0.6667, 0.5976) circle (0.3pt); \filldraw[blue] (0.6667, 0.6047) circle (0.3pt); \filldraw[red] (0.7333, 0.7204) circle (0.3pt); \filldraw[blue] (0.7333, 0.7290) circle (0.3pt); \filldraw[red] (0.8000, 0.8341) circle (0.3pt); \filldraw[blue] (0.8000, 0.8430) circle (0.3pt); \filldraw[red] (0.8667, 0.9276) circle (0.3pt); \filldraw[blue] (0.8667, 0.9352) circle (0.3pt); \filldraw[red] (0.9333, 0.9905) circle (0.3pt); \filldraw[blue] (0.9333, 0.9953) circle (0.3pt); \filldraw[red] (1.0000, 1.0152) circle (0.3pt); \filldraw[blue] (1.0000, 1.0162) circle (0.3pt); \filldraw[red] (1.0667, 0.9986) circle (0.3pt); \filldraw[blue] (1.0667, 0.9953) circle (0.3pt); \filldraw[red] (1.1333, 0.9422) circle (0.3pt); \filldraw[blue] (1.1333, 0.9352) circle (0.3pt); \filldraw[red] (1.2000, 0.8525) circle (0.3pt); \filldraw[blue] (1.2000, 0.8430) circle (0.3pt); \filldraw[red] (1.2667, 0.7393) circle (0.3pt); \filldraw[blue] (1.2667, 0.7290) circle (0.3pt); \filldraw[red] (1.3333, 0.6142) circle (0.3pt); \filldraw[blue] (1.3333, 0.6047) circle (0.3pt); \filldraw[red] (1.4000, 0.4885) circle (0.3pt); \filldraw[blue] (1.4000, 0.4812) circle (0.3pt); \filldraw[red] (1.4667, 0.3719) circle (0.3pt); \filldraw[blue] (1.4667, 0.3674) circle (0.3pt); \filldraw[red] (1.5333, 0.2707) circle (0.3pt); \filldraw[blue] (1.5333, 0.2691) circle (0.3pt); \filldraw[red] (1.6000, 0.1883) circle (0.3pt); \filldraw[blue] (1.6000, 0.1891) circle (0.3pt); \filldraw[red] (1.6667, 0.1251) circle (0.3pt); \filldraw[blue] (1.6667, 0.1274) circle (0.3pt); \filldraw[red] (1.7333, 0.0793) circle (0.3pt); \filldraw[blue] (1.7333, 0.0824) circle (0.3pt); \filldraw[red] (1.8000, 0.0479) circle (0.3pt); \filldraw[blue] (1.8000, 0.0511) circle (0.3pt); \filldraw[red] (1.8667, 0.0276) circle (0.3pt); \filldraw[blue] (1.8667, 0.0304) circle (0.3pt); \filldraw[red] (1.9333, 0.0151) circle (0.3pt); \filldraw[blue] (1.9333, 0.0174) circle (0.3pt); \node at (0, 0.125) [left] {0.01}; \node at (0, 0.25) [left] {0.02}; \node at (0, 0.375) [left] {0.03}; \node at (0, 0.5) [left] {0.04}; \node at (0, 0.625) [left] {0.05}; \node at (0, 0.75) [left] {0.06}; \node at (0, 0.875) [left] {0.07}; \node at (0, 1.0) [left] {0.08}; \node at (0.0000, 0) [below] {45}; \node at (0.3333, 0) [below] {50}; \node at (0.6667, 0) [below] {55}; \node at (1.0000, 0) [below] {60}; \node at (1.3333, 0) [below] {65}; \node at (1.6667, 0) [below] {70}; \node at (2.0000, 0) [below] {75}; \end{tikzpicture} \end{center} You might not be able to tell, but there's actually two plots on this chart. However, let's zoom in on the area we are worried about: \begin{center} \begin{tikzpicture}[scale=4] \draw[->] (0,0) -- (0,1.2); \draw[->] (0,0) -- (2,0); \filldraw[red] (0.0000, 0.1069) circle (0.3pt); \filldraw[blue] (0.0000, 0.0962) circle (0.3pt); \filldraw[red] (0.4000, 0.1999) circle (0.3pt); \filldraw[blue] (0.4000, 0.1830) circle (0.3pt); \filldraw[red] (0.8000, 0.3600) circle (0.3pt); \filldraw[blue] (0.8000, 0.3351) circle (0.3pt); \filldraw[red] (1.2000, 0.6253) circle (0.3pt); \filldraw[blue] (1.2000, 0.5907) circle (0.3pt); \filldraw[red] (1.6000, 1.0476) circle (0.3pt); \filldraw[blue] (1.6000, 1.0028) circle (0.3pt); \node at (0, 0.125) [left] {0.002}; \node at (0, 0.25) [left] {0.004}; \node at (0, 0.375) [left] {0.006}; \node at (0, 0.5) [left] {0.008}; \node at (0, 0.625) [left] {0.01}; \node at (0, 0.75) [left] {0.012}; \node at (0, 0.875) [left] {0.014}; \node at (0, 1.0) [left] {0.016}; \node at (0.0000, 0) [below] {45}; \node at (0.4000, 0) [below] {46}; \node at (0.8000, 0) [below] {47}; \node at (1.2000, 0) [below] {48}; \node at (1.6000, 0) [below] {49}; \end{tikzpicture} \end{center} We can see there are small differences, which could be large in percentage terms. (As we found when we computed them directly). \begin{center} \begin{tikzpicture}[scale=4] \draw[->] (0,0) -- (0,1.2); \draw[->] (0,0) -- (2,0); \filldraw[red] (0.0000, 0.2226) circle (0.3pt); \filldraw[blue] (0.0000, 0.2720) circle (0.3pt); \filldraw[red] (0.2500, 0.6449) circle (0.3pt); \filldraw[blue] (0.2500, 0.7331) circle (0.3pt); \filldraw[red] (0.5000, 0.9323) circle (0.3pt); \filldraw[blue] (0.5000, 0.9861) circle (0.3pt); \filldraw[red] (0.7500, 0.8967) circle (0.3pt); \filldraw[blue] (0.7500, 0.8825) circle (0.3pt); \filldraw[red] (1.0000, 0.6455) circle (0.3pt); \filldraw[blue] (1.0000, 0.5912) circle (0.3pt); \filldraw[red] (1.2500, 0.3710) circle (0.3pt); \filldraw[blue] (1.2500, 0.3161) circle (0.3pt); \filldraw[red] (1.5000, 0.1773) circle (0.3pt); \filldraw[blue] (1.5000, 0.1406) circle (0.3pt); \filldraw[red] (1.7500, 0.0725) circle (0.3pt); \filldraw[blue] (1.7500, 0.0535) circle (0.3pt); \filldraw[red] (2.0000, 0.0259) circle (0.3pt); \filldraw[blue] (2.0000, 0.0178) circle (0.3pt); \node at (0, 0.0) [left] {0.0}; \node at (0, 0.2) [left] {0.005}; \node at (0, 0.4) [left] {0.01}; \node at (0, 0.6) [left] {0.015}; \node at (0, 0.8) [left] {0.02}; \node at (0, 1.0) [left] {0.025}; \node at (0.0000, 0) [below] {0}; \node at (0.5000, 0) [below] {2}; \node at (1.0000, 0) [below] {4}; \node at (1.5000, 0) [below] {6}; \node at (2.0000, 0) [below] {8}; \end{tikzpicture} \end{center} First, we can immediately see that if we just look at the distribution of $B(L, p_h)$ and $B(L, p_{h_\text{approx}})$ we get quite different results, even before we do any approximations. \begin{center} \begin{tikzpicture}[scale=4] \draw[->] (0,0) -- (0,1.2); \draw[->] (0,0) -- (2,0); \filldraw[red] (0.0000, 0.2226) circle (0.3pt); \filldraw[blue] (0.0000, 0.2276) circle (0.3pt); \filldraw[red] (0.2500, 0.6449) circle (0.3pt); \filldraw[blue] (0.2500, 0.5103) circle (0.3pt); \filldraw[red] (0.5000, 0.9323) circle (0.3pt); \filldraw[blue] (0.5000, 0.8150) circle (0.3pt); \filldraw[red] (0.7500, 0.8967) circle (0.3pt); \filldraw[blue] (0.7500, 0.9272) circle (0.3pt); \filldraw[red] (1.0000, 0.6455) circle (0.3pt); \filldraw[blue] (1.0000, 0.7514) circle (0.3pt); \filldraw[red] (1.2500, 0.3710) circle (0.3pt); \filldraw[blue] (1.2500, 0.4338) circle (0.3pt); \filldraw[red] (1.5000, 0.1773) circle (0.3pt); \filldraw[blue] (1.5000, 0.1783) circle (0.3pt); \filldraw[red] (1.7500, 0.0725) circle (0.3pt); \filldraw[blue] (1.7500, 0.0522) circle (0.3pt); \filldraw[red] (2.0000, 0.0259) circle (0.3pt); \filldraw[blue] (2.0000, 0.0109) circle (0.3pt); \node at (0, 0.0) [left] {0.0}; \node at (0, 0.2) [left] {0.005}; \node at (0, 0.4) [left] {0.01}; \node at (0, 0.6) [left] {0.015}; \node at (0, 0.8) [left] {0.02}; \node at (0, 1.0) [left] {0.025}; \node at (0.0000, 0) [below] {0}; \node at (0.5000, 0) [below] {2}; \node at (1.0000, 0) [below] {4}; \node at (1.5000, 0) [below] {6}; \node at (2.0000, 0) [below] {8}; \end{tikzpicture} \end{center} If we plot the probability distribution of $B(L, p_h)$ vs $N(Lp_h, Lp_h(1-p_h))$ we find that it is not a great approximation. \begin{center} \begin{tikzpicture}[scale=4] \draw[->] (0,0) -- (0,1.2); \draw[->] (0,0) -- (2,0); \filldraw[red] (0.0000, 0.0557) circle (0.3pt); \filldraw[blue] (0.0000, 0.0798) circle (0.3pt); \filldraw[red] (0.2500, 0.2169) circle (0.3pt); \filldraw[blue] (0.2500, 0.2073) circle (0.3pt); \filldraw[red] (0.5000, 0.4499) circle (0.3pt); \filldraw[blue] (0.5000, 0.4111) circle (0.3pt); \filldraw[red] (0.7500, 0.6741) circle (0.3pt); \filldraw[blue] (0.7500, 0.6429) circle (0.3pt); \filldraw[red] (1.0000, 0.8355) circle (0.3pt); \filldraw[blue] (1.0000, 0.8307) circle (0.3pt); \filldraw[red] (1.2500, 0.9282) circle (0.3pt); \filldraw[blue] (1.2500, 0.9392) circle (0.3pt); \filldraw[red] (1.5000, 0.9726) circle (0.3pt); \filldraw[blue] (1.5000, 0.9838) circle (0.3pt); \filldraw[red] (1.7500, 0.9907) circle (0.3pt); \filldraw[blue] (1.7500, 0.9968) circle (0.3pt); \filldraw[red] (2.0000, 0.9972) circle (0.3pt); \filldraw[blue] (2.0000, 0.9996) circle (0.3pt); \node at (0, 0.0) [left] {0.0}; \node at (0, 0.2) [left] {0.005}; \node at (0, 0.4) [left] {0.01}; \node at (0, 0.6) [left] {0.015}; \node at (0, 0.8) [left] {0.02}; \node at (0, 1.0) [left] {0.025}; \node at (0.0000, 0) [below] {0}; \node at (0.5000, 0) [below] {2}; \node at (1.0000, 0) [below] {4}; \node at (1.5000, 0) [below] {6}; \node at (2.0000, 0) [below] {8}; \end{tikzpicture} \end{center} However, the CDF happens to be a very good approximation *just* for the value we care about. Very lucky, but not possible for someone sitting STEP to know at the time!
Preview
Problem
Solution
Update Problem
Cancel
Current Ratings
Difficulty Rating:
1600.0
Difficulty Comparisons:
0
Banger Rating:
1500.6
Banger Comparisons:
2
Search Problems
Press Enter to search, Escape to close