Year: 1988
Paper: 3
Question Number: 16
Course: LFM Stats And Pure
Section: Hypergeometric Distribution
Difficulty Rating: 1700.0
Difficulty Comparisons: 0
Banger Rating: 1610.5
Banger Comparisons: 11
Balls are chosen at random without replacement from an urn originally containing $m$ red balls and $M-m$ green balls. Find the probability that exactly $k$ red balls will be chosen in $n$ choices $(0\leqslant k\leqslant m,0\leqslant n\leqslant M).$
The random variables $X_{i}$ $(i=1,2,\ldots,n)$ are defined for $n\leqslant M$ by
\[
X_{i}=\begin{cases}
0 & \mbox{ if the $i$th ball chosen is green}\\
1 & \mbox{ if the $i$th ball chosen is red. }
\end{cases}
\]
Show that
\begin{questionparts}
\item $\mathrm{P}(X_{i}=1)=\dfrac{m}{M}.$
\item $\mathrm{P}(X_{i}=1\mbox{ and }X_{j}=1)=\dfrac{m(m-1)}{M(M-1)}$,
for $i\neq j$.
\end{questionparts}
Find the mean and variance of the random variable $X$ defined by
\[
X=\sum_{i=1}^{n}X_{i}.
\]
There are $\displaystyle \binom{m}{k} \binom{M-m}{n-k}$ ways to choose $k$ red and and $n-k$ green balls out of a total $\displaystyle \binom{M}{n}$ ways to choose balls. Therefore the probability is:
\[ \mathbb{P}(\text{exactly }k\text{ red balls in }n\text{ choices}) = \frac{\binom{m}{k} \binom{M-m}{n-k}}{ \binom{M}{n}}\]
\begin{questionparts}
\item Note that there is nothing special about the $i$th ball chosen. (We could consider all draws look at the $i$th ball, or consider all draws apply a permutation to make the $i$th ball the first ball, and both would look like identical sequences). Therefore $\mathbb{P}(X_i = 1) = \mathbb{P}(X_1 = 1) = \frac{m}{M}$.
\item Similarly we could apply a permutation to all sequences which takes the $i$th ball to the first ball and the $j$th ball to the second ball, therefore:
\begin{align*}
\mathbb{P}(X_i = 1, X_j = 1) &= \mathbb{P}(X_1 = 1, X_2 = 1) \\
&= \mathbb{P}(X_1 = 1) \cdot \mathbb{P}(X_2 = 1 | X_1 = 1) \\
&= \frac{m}{M} \cdot \frac{m-1}{M-1} \\
&= \frac{m(m-1)}{M(M-1)}
\end{align*}
\end{questionparts}
So:
\begin{align*}
\mathbb{E}(X) &= \mathbb{E}(\sum_{i=1}^{n}X_{i}) \\
&= \sum_{i=1}^{n}\mathbb{E}(X_{i}) \\
&= \sum_{i=1}^{n} 1\cdot\mathbb{P}(X_i = 1) \\
&= \sum_{i=1}^{n} \frac{m}{M} \\
&= \frac{mn}{M}
\end{align*}
and
\begin{align*}
\mathbb{E}(X^2) &= \mathbb{E}\left[\left(\sum_{i=1}^{n}X_{i} \right)^2 \right] \\
&= \mathbb{E}\left[\sum_{i=1}^n X_i^2 + 2 \sum_{i < j} X_i X_j \right] \\
&= \sum_{i=1}^n \mathbb{E}(X_i^2) + 2 \sum_{i < j} \mathbb{E}(X_i X_j) \\
&= \frac{nm}{M} + n(n-1) \frac{m(m-1)}{M(M-1)} \\
\textrm{Var}(X) &= \mathbb{E}(X^2) - (\mathbb{E}(X))^2 \\
&= \frac{nm}{M} + n(n-1) \frac{m(m-1)}{M(M-1)} - \frac{n^2m^2}{M^2} \\
&= \frac{nm}{M} \left (1-\frac{nm}{M}+(n-1)\frac{m-1}{M-1} \right) \\
&= \frac{nm}{M} \left ( \frac{M(M-1)-(M-1)nm+(n-1)(m-1)M}{M(M-1)} \right) \\
&= \frac{nm}{M} \frac{(M-m)(M-n)}{M(M-1)} \\
&= n \frac{m}{M} \frac{M-m}{M} \frac{M-n}{M-1}
\end{align*}
Note: This is a very nice way of deriving the mean and variance of the hypergeometric distribution