2013 Paper 3 Q12

Year: 2013
Paper: 3
Question Number: 12

Course: UFM Statistics
Section: Bivariate data

Difficulty: 1700.0 Banger: 1500.0

Problem

A list consists only of letters \(A\) and \(B\) arranged in a row. In the list, there are \(a\) letter \(A\)s and \(b\) letter \(B\)s, where \(a\ge2\) and \(b\ge2\), and \(a+b=n\). Each possible ordering of the letters is equally probable. The random variable \(X_1\) is defined by \[ X_1 = \begin{cases} 1 & \text{if the first letter in the row is \(A\)};\\ 0 & \text{otherwise.} \end{cases} \] The random variables \(X_k\) (\(2 \le k \le n\)) are defined by \[ X_k = \begin{cases} 1 & \text{if the \((k-1)\)th letter is \(B\) and the \(k\)th is \(A\)};\\ 0 & \text{otherwise.} \end{cases} \] The random variable \(S\) is defined by \(S = \sum\limits_ {i=1}^n X_i\,\).
  1. Find expressions for \(\E(X_i)\), distinguishing between the cases \(i=1\) and \(i\ne1\), and show that \(\E(S)= \dfrac{a(b+1)}n\,\).
  2. Show that:
    1. for \(j\ge3\), \(\E(X_1X_j) = \dfrac{a(a-1)b}{n(n-1)(n-2)}\,\);
    2. \[ \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \E(X_iX_j)\bigg) = \dfrac{a(a-1)b(b-1)}{2n(n-1)}\,\]
    3. \(\var(S) = \dfrac {a(a-1)b(b+1)}{n^2(n-1)}\,\).

Solution

  1. Notice that \(\E[X_1] = \frac{a}{n}\) and consider \(\E[X_i]\) with \(i > 1\). the probability that this is \(1\) is \(\frac{b}{n} \cdot \frac{a}{n-1}\). So \begin{align*} && \E[S] &= \E[X_1] + \sum_{i=2}^n \E[X_i] \\ &&&= \frac{a}{n} + (n-1) \frac{ab}{n(n-1)} \\ &&&= \frac{a(b+1)}{n} \end{align*}
    1. The probability \(X_1X_j = 1\) is \(\frac{a}{n} \cdot \frac{b}{n-1} \cdot \frac{a-1}{n-2} = \frac{a(a-1)b}{n(n-1)(n-2)}\) since there is nothing special about the order, and the first is an \(A\) with probability \(\frac{a}{n}\) and given this occurs there are now \(a-1\) \(A\) and \(n-1\) letters left etc... Therefore \(\E[X_1X_j] = \frac{a(a-1)b}{n(n-1)(n-2)}\)
    2. \(\E[X_iX_j]\) when the pairs don't overlap is \(\frac{a}{n} \frac{b}{n-1} \frac{a-1}{n-2} \frac{b-1}{n-3}\), and so \begin{align*} && \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \E(X_iX_j)\bigg) &= \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\bigg) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n 1\bigg) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\sum\limits_{i=2}^{n-2} (n-(i+1)) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ((n-1)(n-3)-\frac{(n-2)(n-1)}{2}+1 \right) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ( \frac{2n^2-8n-6-n^2+3n-2+2}{2}\right) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ( \frac{n^2-5n-6}{2}\right) \\ &&&= \frac{a(a-1)b(b-1)}{2n(n-1)} \end{align*}
    3. We also need to consider the other cross terms. \(X_iX_{i+1}=0\). (Since \(X_i = 1\) means the \(i\)th letter is \(A\) and \(X_{i+1} = 1\) means the \(i\)th letter is \(B\)). It's the same story for \(X_1X_2\), and so all the cross terms are accounted for. Therefore \begin{align*} && \E[S^2] &= \E \left [\sum X_i^2 + 2\sum_{i \neq j} X_i X_j \right] \\ &&&= \frac{a(b+1)}{n} +2(n-2)\frac{a(a-1)b}{n(n-1)(n-2)}+ 2 \frac{a(a-1)b(b-1)}{2n(n-1)} \\ &&&= \frac{a(b+1)}{n} +\frac{2a(a-1)b}{n(n-1)} + \frac{a(a-1)b(b-1)}{n(n-1)} \\ &&&= \frac{a(b+1)}{n} +\frac{a(a-1)b(b+1)}{n(n-1)} \\ && \var[S] &= \E[S^2] - \left ( \E[S] \right)^2 \\ &&&= \frac{a(b+1)}{n} + \frac{a(a-1)b(b+1)}{n(n-1)} - \frac{a^2(b+1)^2}{n^2} \\ &&&= \frac{a(b+1) \left (n(n-1) + (a-1)b n -a(b+1)(n-1) \right)}{n^2(n-1)} \\ &&&= \frac{a(b+1) \left ( (n-a)(n-b-1) \right)}{n^2(n-1)} \\ &&&= \frac{a(b+1) \left ( b(a-1) \right)}{n^2(n-1)} \\ \end{align*}
Examiner's report
— 2013 STEP 3, Question 12
Mean: ~5.7 / 20 (inferred) ~11% attempted (inferred) Inferred ~5.7/20 from 'slightly less success than Q8' (Q8 ≈ 6.7, −1.0); inferred 11% from 'a ninth'

This was the least popular question, attempted by a ninth of the candidates, with slightly less success than question 8. The immediate problem was many made no mention of probabilities in order to calculate expectations. Throughout, there was very poor justification, which included treating the random variables as though they were independent and compensating errors which led to given results. Most progressed no further than part (a) of (ii) at best and many had incorrect expressions.

With the number of candidates submitting scripts up by some 8% from last year, and whilst inevitably some questions were more popular than others, namely the first two, 7 then 4 and 5 to a lesser extent, all questions on the paper were attempted by a significant number of candidates. About a sixth of candidates gave in answers to more than six questions, but the extra questions were invariably scoring negligible marks. Two fifths of the candidates gave in answers to six questions.

Source: Cambridge STEP 2013 Examiner's Report · 2013-full.pdf
Rating Information

Difficulty Rating: 1700.0

Difficulty Comparisons: 0

Banger Rating: 1500.0

Banger Comparisons: 0

Show LaTeX source
Problem source
A list consists only of letters $A$ and $B$ arranged in a row. In the list, there are $a$ letter $A$s and $b$ letter $B$s, where $a\ge2$ and $b\ge2$, and $a+b=n$. Each possible ordering of the letters is equally probable. The random variable $X_1$ is defined by
\[
X_1 = 
\begin{cases}
1 & \text{if the first letter in the row is $A$};\\
0 & \text{otherwise.}
\end{cases}
\]
The random variables $X_k$ ($2 \le k \le n$) are defined by
\[
X_k = 
\begin{cases}
1 & \text{if the $(k-1)$th  letter  is $B$ and the $k$th is $A$};\\
0 & \text{otherwise.}
\end{cases}
\]
The random variable $S$ is defined by $S = \sum\limits_ {i=1}^n X_i\,$.
\begin{questionparts}
\item Find expressions for $\E(X_i)$, 
distinguishing between the cases $i=1$ and $i\ne1$, and show
that $\E(S)= \dfrac{a(b+1)}n\,$.
\item Show that:
\begin{enumerate}
\item for $j\ge3$,  $\E(X_1X_j) = \dfrac{a(a-1)b}{n(n-1)(n-2)}\,$;
\item \[ \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \E(X_iX_j)\bigg)
= \dfrac{a(a-1)b(b-1)}{2n(n-1)}\,\]
\item $\var(S) = \dfrac {a(a-1)b(b+1)}{n^2(n-1)}\,$.
\end{enumerate}
\end{questionparts}
Solution source
\begin{questionparts}
\item Notice that $\E[X_1] = \frac{a}{n}$ and consider $\E[X_i]$ with $i > 1$. the probability that this is $1$ is $\frac{b}{n} \cdot \frac{a}{n-1}$. So

\begin{align*}
&& \E[S] &= \E[X_1] + \sum_{i=2}^n \E[X_i] \\
&&&= \frac{a}{n} + (n-1) \frac{ab}{n(n-1)} \\
&&&= \frac{a(b+1)}{n}
\end{align*}

\item \begin{enumerate}
\item The probability $X_1X_j = 1$ is $\frac{a}{n} \cdot \frac{b}{n-1} \cdot \frac{a-1}{n-2} = \frac{a(a-1)b}{n(n-1)(n-2)}$ since there is nothing special about the order, and the first is an $A$ with probability $\frac{a}{n}$ and given this occurs there are now $a-1$ $A$ and $n-1$ letters left etc... Therefore $\E[X_1X_j] =  \frac{a(a-1)b}{n(n-1)(n-2)}$

\item $\E[X_iX_j]$ when the pairs don't overlap is $\frac{a}{n} \frac{b}{n-1} \frac{a-1}{n-2} \frac{b-1}{n-3}$, and so

\begin{align*}
&&  \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \E(X_iX_j)\bigg) &=  \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\bigg) \\
&&&=  \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n 1\bigg) \\
&&&=  \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\sum\limits_{i=2}^{n-2} (n-(i+1)) \\
&&&=  \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ((n-1)(n-3)-\frac{(n-2)(n-1)}{2}+1 \right) \\
&&&=  \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ( \frac{2n^2-8n-6-n^2+3n-2+2}{2}\right) \\
&&&=  \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ( \frac{n^2-5n-6}{2}\right) \\
&&&= \frac{a(a-1)b(b-1)}{2n(n-1)}
\end{align*}

\item We also need to consider the other cross terms. $X_iX_{i+1}=0$. (Since $X_i = 1$ means the $i$th letter is $A$ and $X_{i+1} = 1$ means the $i$th letter is $B$). It's the same story for $X_1X_2$, and so all the cross terms are accounted for. Therefore

\begin{align*}
&& \E[S^2] &= \E \left [\sum X_i^2 + 2\sum_{i \neq j} X_i X_j \right] \\
&&&= \frac{a(b+1)}{n} +2(n-2)\frac{a(a-1)b}{n(n-1)(n-2)}+ 2  \frac{a(a-1)b(b-1)}{2n(n-1)} \\
&&&= \frac{a(b+1)}{n} +\frac{2a(a-1)b}{n(n-1)} +  \frac{a(a-1)b(b-1)}{n(n-1)} \\
&&&= \frac{a(b+1)}{n} +\frac{a(a-1)b(b+1)}{n(n-1)} 
\\
&& \var[S] &= \E[S^2] - \left ( \E[S] \right)^2 \\
&&&= \frac{a(b+1)}{n} +   \frac{a(a-1)b(b+1)}{n(n-1)} - \frac{a^2(b+1)^2}{n^2} \\
&&&= \frac{a(b+1) \left (n(n-1) + (a-1)b n -a(b+1)(n-1) \right)}{n^2(n-1)} \\
&&&= \frac{a(b+1) \left ( (n-a)(n-b-1) \right)}{n^2(n-1)} \\
&&&= \frac{a(b+1) \left ( b(a-1) \right)}{n^2(n-1)} \\
\end{align*}
\end{enumerate}

\end{questionparts}