Bivariate data

Showing 1-16 of 16 problems
2018 Paper 3 Q12
D: 1700.0 B: 1516.0

A random process generates, independently, \(n\) numbers each of which is drawn from a uniform (rectangular) distribution on the interval 0 to 1. The random variable \(Y_k\) is defined to be the \(k\)th smallest number (so there are \(k-1\) smaller numbers).

  1. Show that, for \(0\le y\le1\,\), \[ {\rm P}\big(Y_k\le y) =\sum^{n}_{m=k}\binom{n}{m}y^{m}\left(1-y\right)^{n-m} . \tag{\(*\)} \]
  2. Show that \[ m\binom n m = n \binom {n-1}{m-1} \] and obtain a similar expression for \(\displaystyle (n-m) \, \binom n m\,\). Starting from \((*)\), show that the probability density function of \(Y_k\) is \[ n\binom{ n-1}{k-1} y^{k-1}\left(1-y\right)^{ n-k} \,.\] Deduce an expression for \(\displaystyle \int_0^1 y^{k-1}(1-y)^{n-k} \, \d y \,\).
  3. Find \(\E(Y_k) \) in terms of \(n\) and \(k\).

Show Solution
  1. \begin{align*} && \mathbb{P}(Y_k \leq y) &= \sum_{j=k}^n\mathbb{P}(\text{exactly }j \text{ values less than }y) \\ &&&= \sum_{j=k}^m \binom{m}{j} y^j(1-y)^{n-j} \end{align*}
  2. This is the number of ways to choose a committee of \(m\) people with the chair from those \(m\) people. This can be done in two ways. First: choose the committee in \(\binom{n}{m}\) ways and choose the chair in \(m\) ways so \(m \binom{n}{m}\). Alternatively, choose the chain in \(n\) ways and choose the remaining \(m-1\) committee members in \(\binom{n-1}{m-1}\) ways. Therefore \(m \binom{n}{m} = n \binom{n-1}{m-1}\) \begin{align*} (n-m) \binom{n}{m} &= (n-m) \binom{n}{n-m} \\ &= n \binom{n-1}{n-m-1} \\ &= n \binom{n-1}{m} \end{align*} \begin{align*} f_{Y_k}(y) &= \frac{\d }{\d y} \l \sum^{n}_{m=k}\binom{n}{m}y^{m}\left(1-y\right)^{n-m} \r \\ &= \sum^{n}_{m=k} \l \binom{n}{m}my^{m-1}\left(1-y\right)^{n-m} -\binom{n}{m}(n-m)y^{m}\left(1-y\right)^{n-m-1} \r \\ &= \sum^{n}_{m=k} \l n \binom{n-1}{m-1}y^{m-1}\left(1-y\right)^{n-m} -n \binom{n-1}{m} y^{m}\left(1-y\right)^{n-m-1} \r \\ &= n\sum^{n}_{m=k} \binom{n-1}{m-1}y^{m-1}\left(1-y\right)^{n-m} -n\sum^{n+1}_{m=k+1} \binom{n-1}{m-1} y^{m-1}\left(1-y\right)^{n-m} \\ &= n \binom{n-1}{k-1} y^{k-1}(1-y)^{n-k} \end{align*} \begin{align*} &&1 &= \int_0^1 f_{Y_k}(y) \d y \\ &&&= \int_0^1 n \binom{n-1}{k-1} y^{k-1}(1-y)^{n-k} \d y \\ &&&= n \binom{n-1}{k-1} \int_0^1 y^{k-1}(1-y)^{n-k} \d y \\ \Rightarrow && \frac{1}{n \binom{n-1}{k-1}} &= \int_0^1 y^{k-1}(1-y)^{n-k} \d y \\ \end{align*}
  3. \begin{align*} && \mathbb{E}(Y_k) &= \int_0^1 y f_{Y_k}(y) \d y \\ &&&= \int_0^1 n \binom{n-1}{k-1} y^{k}(1-y)^{n-k} \\ &&&= n \binom{n-1}{k-1}\int_0^1 y^{k}(1-y)^{n-k} \d y \\ &&&= n \binom{n-1}{k-1}\int_0^1 y^{k+1-1}(1-y)^{n+1-(k+1)} \d y \\ &&&= n \binom{n-1}{k-1} \frac{1}{(n+1) \binom{n}{k}}\\ &&&= \frac{n}{n+1} \cdot \frac{k}{n} \\ &&&= \frac{k}{n+1} \end{align*}
2017 Paper 3 Q12
D: 1700.0 B: 1500.2

The discrete random variables \(X\) and \(Y\) can each take the values \(1\), \(\ldots\,\), \(n\) (where \(n\ge2\)). Their joint probability distribution is given by \[ \P(X=x, \ Y=y) = k(x+y) \,, \] where \(k\) is a constant.

  1. Show that \[ \P(X=x) = \dfrac{n+1+2x}{2n(n+1)}\,. \] Hence determine whether \(X\) and \(Y\) are independent.
  2. Show that the covariance of \(X\) and \(Y\) is negative.

Show Solution
  1. \(\,\) \begin{align*} && \mathbb{P}(X = x) &= \sum_{y=1}^n \mathbb{P}(X=x,Y=y) \\ &&&= \sum_{y=1}^n k(x+y) \\ &&&= nkx + k\frac{n(n+1)}2 \\ \\ && 1 &= \sum_{x=1}^n \mathbb{P}(X=x) \\ &&&= nk\frac{n(n+1)}{2} + kn\frac{n(n+1)}2 \\ &&&= kn^2(n+1) \\ \Rightarrow && k &= \frac{1}{n^2(n+1)} \\ \Rightarrow && \mathbb{P}(X = x) &= \frac{nx}{n^2(n+1)} + \frac{n(n+1)}{2n^2(n+1)} \\ &&&= \frac{n+1+2x}{2n(n+1)} \\ \\ && \mathbb{P}(X=x)\mathbb{P}(Y=y) &= \frac{(n+1)^2+2(n+1)(x+y)+4xy}{4n^2(n+1)^2} \\ &&&\neq \frac{x+y}{n^2(n+1)} \end{align*} Therefore \(X\) and \(Y\) are not independent.
  2. \(\,\) \begin{align*} && \E[X] &= \sum_{x=1}^n x \mathbb{P}(X=x) \\ &&&= \sum_{x=1}^n x \mathbb{P}(X=x)\\ &&&= \sum_{x=1}^n x \frac{n+1+2x}{2n(n+1)} \\ &&&= \frac{1}{2n(n+1)} \left ( (n+1) \sum x + 2\sum x^2\right)\\ &&&= \frac{1}{2n(n+1)} \left ( \frac{n(n+1)^2}{2} + \frac{n(n+1)(2n+1)}{3} \right) \\ &&&= \frac{1}{2} \left ( \frac{n+1}{2} + \frac{2n+1}{3} \right)\\ &&&= \frac{1}{2} \left ( \frac{7n+5}{6} \right)\\ &&&= \frac{7n+5}{12} \\ \\ && \textrm{Cov}(X,Y) &= \mathbb{E}\left[XY\right] - \E[X] \E[Y] \\ &&&= \sum_{x=1}^n \sum_{y=1}^n xy \frac{x+y}{n^2(n+1)} - \E[X]^2 \\ &&&= \frac{1}{n^2(n+1)} \sum \sum (x^2 y+xy^2) - \E[X]^2 \\ &&&= \frac{1}{n^2(n+1)} \left (\sum y \right )\left (\sum x^2\right ) - \E[X]^2 \\ &&&=\frac{(n+1)(2n+1)}{12} - \left ( \frac{7n+5}{12}\right)^2 \\ &&&= \frac1{144} \left (12(2n^2+3n+1) - (49n^2+70n+25) \right)\\ &&&= \frac{1}{144} \left (-25n^2-34n-13 \right) \\ &&& < 0 \end{align*} since \(\Delta = 34^2 - 4 \cdot 25 \cdot 13 = 4(17^2-25 \times 13) = -4 \cdot 36 < 0\)
2016 Paper 3 Q13
D: 1700.0 B: 1500.0

Given a random variable \(X\) with mean \(\mu\) and standard deviation \(\sigma\), we define the kurtosis, \(\kappa\), of \(X\) by \[ \kappa = \frac{ \E\big((X-\mu)^4\big)}{\sigma^4} -3 \,. \] Show that the random variable \(X-a\), where \(a\) is a constant, has the same kurtosis as \(X\).

  1. Show by integration that a random variable which is Normally distributed with mean 0 has kurtosis 0.
  2. Let \(Y_1, Y_2, \ldots, Y_n\) be \(n\) independent, identically distributed, random variables with mean 0, and let \(T = \sum\limits_{r=1}^n Y_r\). Show that \[ \E(T^4) = \sum_{r=1}^n \E(Y_r^4) + 6 \sum_{r=1}^{n-1} \sum_{s=r+1}^{n} \E(Y^2_s) \E(Y^2_r) \,. \]
  3. Let \(X_1\), \(X_2\), \(\ldots\)\,, \(X_n\) be \(n\) independent, identically distributed, random variables each with kurtosis \(\kappa\). Show that the kurtosis of their sum is \(\dfrac\kappa n\,\).

Show Solution
\begin{align*} &&\kappa_{X-a} &= \frac{\mathbb{E}\left(\left(X-a-(\mu-a)\right)^4\right)}{\sigma_{X-a}^4}-3 \\ &&&= \frac{\mathbb{E}\left(\left(X-\mu\right)^4\right)}{\sigma_X^4}-3\\ &&&= \kappa_X \end{align*}
  1. \(\,\) \begin{align*} && \kappa &= \frac{\mathbb{E}((X-\mu)^4)}{\sigma^4} - 3 \\ &&&= \frac{\mathbb{E}((\mu+\sigma Z-\mu)^4)}{\sigma^4} - 3 \\ &&&= \frac{\mathbb{E}((\sigma Z)^4)}{\sigma^4} - 3 \\ &&&= \mathbb{E}(Z^4)-3\\ &&&= \int_{-\infty}^{\infty} x^4\frac{1}{\sqrt{2\pi}} \exp \left ( - \frac12x^2 \right)\d x -3 \\ &&&= \left [\frac{1}{\sqrt{2\pi}}x^{3} \cdot \left ( -\exp \left ( - \frac12x^2 \right)\right) \right]_{-\infty}^{\infty} + \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty 3x^2 \exp \left ( - \frac12x^2 \right) \d x - 3 \\ &&&= 0 + 3 \textrm{Var}(Z) - 3 =0 \end{align*}
  2. \(\,\) \begin{align*} && \mathbb{E}(T^4) &= \mathbb{E} \left [\left ( \sum\limits_{r=1}^n Y_r\right)^4\right] \\ &&&= \mathbb{E} \left [ \sum_{r=1}^n Y_r^4+\sum_{i\neq j} 4Y_iY_j^3+\sum_{i\neq j} 6Y_i^2Y_j^2+\sum_{i\neq j \neq k} 12Y_iY_jY_k^2 +\sum_{i\neq j\neq k \neq l}24 Y_iY_jY_kY_l\right] \\ &&&= \sum_{r=1}^n \mathbb{E} \left [ Y_r^4 \right]+\sum_{i\neq j} \mathbb{E} \left [ 4Y_iY_j^3\right]+\sum_{i\neq j} \mathbb{E} \left [ 6Y_i^2Y_j^2\right]+\sum_{i\neq j \neq k} \mathbb{E} \left [ 12Y_iY_jY_k^2\right] +\sum_{i\neq j\neq k \neq l} \mathbb{E} \left [ 24 Y_iY_jY_kY_l\right] \\ &&&= \sum_{r=1}^n \mathbb{E} \left [ Y_r^4 \right]+4\sum_{i\neq j} \mathbb{E} \left [ Y_i]\mathbb{E}[Y_j^3\right]+6\sum_{i\neq j} \mathbb{E} \left [ Y_i^2]\mathbb{E}[Y_j^2\right]+12\sum_{i\neq j \neq k} \mathbb{E} \left [ Y_i]\mathbb{E}[Y_j]\mathbb{E}[Y_k^2\right] +24\sum_{i\neq j\neq k \neq l} \mathbb{E} \left [ Y_i]\mathbb{E}[Y_j]\mathbb{E}[Y_k]\mathbb{E}[Y_l\right] \\ &&&= \sum_{r=1}^n \mathbb{E} \left [ Y_r^4 \right]+6\sum_{i\neq j} \mathbb{E} \left [ Y_i^2]\mathbb{E}[Y_j^2\right] \end{align*}
  3. Without loss of generality, we may assume they all have mean zero. Therefore we can consider the sitatuion as in the previous case with \(T\) and \(Y_i\)s. Note that \(\mathbb{E}(Y_i^4) = \sigma^4(\kappa + 3)\) and \(\textrm{Var}(T) = n \sigma^2\) \begin{align*} && \kappa_T &= \frac{\mathbb{E}(T^4)}{(\textrm{Var}(T))^2} - 3 \\ &&&= \frac{\sum_{r=1}^n \mathbb{E} \left [ Y_r^4 \right]+6\sum_{i\neq j} \mathbb{E} \left [ Y_i^2\right]\mathbb{E}\left[Y_j^2\right]}{n^2\sigma^4}-3 \\ &&&= \frac{n\sigma^4(\kappa+3)+6\binom{n}{2}\sigma^4}{n^2\sigma^4} -3\\ &&&= \frac{\kappa}{n} + \frac{3n + \frac{6n(n-1)}{2}}{n^2} - 3 \\ &&&= \frac{\kappa}{n} + \frac{3n^2}{n^2}-3 \\ &&&= \frac{\kappa}{n} \end{align*}
2015 Paper 3 Q13
D: 1700.0 B: 1500.0

Each of the two independent random variables \(X\) and \(Y\) is uniformly distributed on the interval~\([0,1]\).

  1. By considering the lines \(x+y =\) \(\mathrm{constant}\) in the \(x\)-\(y\) plane, find the cumulative distribution function of \(X+Y\).
  2. Hence show that the probability density function \(f\) of \((X+Y)^{-1}\) is given by \[ \f(t) = \begin{cases} 2t^{-2} -t^{-3} & \text{for \( \tfrac12 \le t \le 1\)} \\ t^{-3} & \text{for \(1\le t <\infty\)}\\ 0 & \text{otherwise}. \end{cases} \] Evaluate \(\E\Big(\dfrac1{X+Y}\Big)\,\).
  3. Find the cumulative distribution function of \(Y/X\) and use this result to find the probability density function of \(\dfrac X {X+Y}\). Write down \(\E\Big( \dfrac X {X+Y}\Big)\) and verify your result by integration.

Show Solution
  1. \(\mathbb{P}(X + Y \leq c) \) is the area between the \(x\)-axis, \(y\)-axis and the line \(x + y = c\). There are two cases for this: \[\mathbb{P}(X + Y \leq c) = \begin{cases} 0 & \text{ if } c \leq 0 \\ \frac{c^2}{2} & \text{ if } c \leq 1 \\ 1- \frac{(2-c)^2}{2} & \text{ if } 1 \leq c \leq 2 \\ 1 & \text{ otherwise} \end{cases}\]
  2. \begin{align*} && \mathbb{P}((X + Y)^{-1} \leq t) &= 1- \mathbb{P}(X + Y \leq \frac1{t}) \\ \Rightarrow && f_{(X+Y)^{-1}}(t) &= 0 -\begin{cases} 0 & \text{ if } \frac1{t} \leq 0 \\ \frac{\d}{\d t}\frac{1}{2t^2} & \text{ if } \frac{1}{t} \leq 1 \\ \frac{\d}{\d t} \l 1- \frac{(2-\frac1t)^2}{2} \r & \text{ if } 1 \leq \frac{1}{t} \leq 2 \\ 0 & \text{ otherwise}\end{cases} \\ && &= \begin{cases} t^{-3} & \text{ if } t \geq 1 \\ (2-\frac1t)t^{-2} & \text{ if } \frac12 \leq t \leq 1\\ 0 & \text{ otherwise}\end{cases} \\ && &= \begin{cases} t^{-3} & \text{ if } t \geq 1 \\ 2t^{-2}-t^{-3} & \text{ if } \frac12 \leq t \leq 1\\ 0 & \text{ otherwise}\end{cases} \end{align*} Therefore, \begin{align*} \E \Big(\dfrac1{X+Y}\Big) &= \int_{\frac12}^{\infty} t f_{(X+Y)^{-1}}(t) \, \d t \\ &= \int_{\frac12}^{1} t f_{(X+Y)^{-1}}(t) \, \d t + \int_{1}^{\infty} t f_{(X+Y)^{-1}}(t) \d t\\ &= \int_{\frac12}^{1} \l 2t^{-1} - t^{-2} \r \, \d t + \int_{1}^{\infty} t^{-2} \d t\\ &= \left [ 2 \ln (t) + t^{-1} \right]_{\frac12}^{1} + \left [ -t^{-1} \right ]_{1}^{\infty} \\ &= 1 + 2 \ln 2 -2 + 1 \\ &= 2 \ln 2 \end{align*}
  3. \begin{align*} &&\mathbb{P} \l \frac{Y}{X} \leq c \r &= \mathbb{P}( Y \leq c X) \\ &&&= \begin{cases} 0 & \text{if } c \leq 0 \\ \frac{c}{2} & \text{if } 0 \leq c \leq 1 \\ 1-\frac{1}{2c} & \text{if } 1 \leq c \end{cases} \\ \\ \Rightarrow && \mathbb{P} \l \frac{X}{X+Y} \leq t\r &= \mathbb{P} \l \frac{1}{1+\frac{Y}{X}} \leq t\r \\ &&&= \mathbb{P} \l \frac{1}{t} \leq 1+\frac{Y}{X}\r \\ &&&= \mathbb{P} \l \frac{1}{t} - 1\leq \frac{Y}{X}\r \\ &&&= 1- \mathbb{P} \l \frac{Y}{X} \leq \frac{1}{t} - 1\r \\ &&&= 1 - \begin{cases} 0 & \text{if } \frac1{t} \leq 0 \\ \frac{1}{2t} - \frac{1}{2} & \text{if } 0 \leq \frac1{t} \leq 1 \\ 1-\frac{t}{2-2t} & \text{if } 1 \leq \frac1{t} \end{cases} \\ && f_{\frac{X}{X+Y}}(t) &= \begin{cases} 0 & \text{if } \frac1{t} \leq 0 \\ \frac{1}{2t^2} & \text{if } t \geq 1 \\ \frac{1}{2(1-t)^2} & \text{if } 0 \leq t \leq 1 \end{cases} \\ \Rightarrow && \mathbb{E} \l \frac{X}{X+Y} \r &= \int_0^\infty t f(t) \d t \\ &&&= \int_0^1 \frac{1}{2(1-t)^2} \d t + \int_1^\infty \frac{1}{t^2} \d t \\ &&& = \frac{1}{4} + \frac{1}{4} = \frac{1}{2} \\ \\ && \mathbb{E} \l \frac{X}{X+Y} \r &= \int_0^1 \int_0^1 \frac{x}{x+y} \d y\d x \\ &&&= \int_0^1 \l x \ln (x+1) - x \ln x \r \d x \\ &&&= \left [\frac{x^2}2 \ln(x+1) - \frac{x^2}{2} \ln(x) \right]_0^1 -\int_0^1 \l \frac{x^2}{2(x+1)} - \frac{x}{2} \r \d x \\ &&&= \frac{\ln 2}{2} + \frac{1}{4} - \int_0^1 \frac{x^2-1+1}{2(x+1)}\d x \\ &&&= \frac{\ln 2}{2} + \frac{1}{4} - \int_0^1 \frac{x -1}{2} + \frac{1}{2(x+1)}\d x \\ &&&= \frac{\ln 2}{2} + \frac{1}{4} - \frac{1}{4} + \frac{1}{2} - \frac{\ln 2}{2} \\ &&&= \frac{1}{2} \end{align*} We can also notice that \(1 = \mathbb{E} \l \frac{X+Y}{X+Y} \r = \mathbb{E} \l \frac{X}{X+Y} \r + \mathbb{E} \l \frac{Y}{X+Y} \r = 2 \mathbb{E} \l \frac{X}{X+Y} \r\) so it's clearly true as long as we can show that the integral converges.
2013 Paper 3 Q12
D: 1700.0 B: 1500.0

A list consists only of letters \(A\) and \(B\) arranged in a row. In the list, there are \(a\) letter \(A\)s and \(b\) letter \(B\)s, where \(a\ge2\) and \(b\ge2\), and \(a+b=n\). Each possible ordering of the letters is equally probable. The random variable \(X_1\) is defined by \[ X_1 = \begin{cases} 1 & \text{if the first letter in the row is \(A\)};\\ 0 & \text{otherwise.} \end{cases} \] The random variables \(X_k\) (\(2 \le k \le n\)) are defined by \[ X_k = \begin{cases} 1 & \text{if the \((k-1)\)th letter is \(B\) and the \(k\)th is \(A\)};\\ 0 & \text{otherwise.} \end{cases} \] The random variable \(S\) is defined by \(S = \sum\limits_ {i=1}^n X_i\,\).

  1. Find expressions for \(\E(X_i)\), distinguishing between the cases \(i=1\) and \(i\ne1\), and show that \(\E(S)= \dfrac{a(b+1)}n\,\).
  2. Show that:
    1. for \(j\ge3\), \(\E(X_1X_j) = \dfrac{a(a-1)b}{n(n-1)(n-2)}\,\);
    2. \[ \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \E(X_iX_j)\bigg) = \dfrac{a(a-1)b(b-1)}{2n(n-1)}\,\]
    3. \(\var(S) = \dfrac {a(a-1)b(b+1)}{n^2(n-1)}\,\).

Show Solution
  1. Notice that \(\E[X_1] = \frac{a}{n}\) and consider \(\E[X_i]\) with \(i > 1\). the probability that this is \(1\) is \(\frac{b}{n} \cdot \frac{a}{n-1}\). So \begin{align*} && \E[S] &= \E[X_1] + \sum_{i=2}^n \E[X_i] \\ &&&= \frac{a}{n} + (n-1) \frac{ab}{n(n-1)} \\ &&&= \frac{a(b+1)}{n} \end{align*}
    1. The probability \(X_1X_j = 1\) is \(\frac{a}{n} \cdot \frac{b}{n-1} \cdot \frac{a-1}{n-2} = \frac{a(a-1)b}{n(n-1)(n-2)}\) since there is nothing special about the order, and the first is an \(A\) with probability \(\frac{a}{n}\) and given this occurs there are now \(a-1\) \(A\) and \(n-1\) letters left etc... Therefore \(\E[X_1X_j] = \frac{a(a-1)b}{n(n-1)(n-2)}\)
    2. \(\E[X_iX_j]\) when the pairs don't overlap is \(\frac{a}{n} \frac{b}{n-1} \frac{a-1}{n-2} \frac{b-1}{n-3}\), and so \begin{align*} && \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \E(X_iX_j)\bigg) &= \sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\bigg) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\sum\limits_{i=2}^{n-2} \bigg( \sum\limits_{j=i+2}^n 1\bigg) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)}\sum\limits_{i=2}^{n-2} (n-(i+1)) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ((n-1)(n-3)-\frac{(n-2)(n-1)}{2}+1 \right) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ( \frac{2n^2-8n-6-n^2+3n-2+2}{2}\right) \\ &&&= \frac{a(a-1)b(b-1)}{n(n-1)(n-2)(n-3)} \left ( \frac{n^2-5n-6}{2}\right) \\ &&&= \frac{a(a-1)b(b-1)}{2n(n-1)} \end{align*}
    3. We also need to consider the other cross terms. \(X_iX_{i+1}=0\). (Since \(X_i = 1\) means the \(i\)th letter is \(A\) and \(X_{i+1} = 1\) means the \(i\)th letter is \(B\)). It's the same story for \(X_1X_2\), and so all the cross terms are accounted for. Therefore \begin{align*} && \E[S^2] &= \E \left [\sum X_i^2 + 2\sum_{i \neq j} X_i X_j \right] \\ &&&= \frac{a(b+1)}{n} +2(n-2)\frac{a(a-1)b}{n(n-1)(n-2)}+ 2 \frac{a(a-1)b(b-1)}{2n(n-1)} \\ &&&= \frac{a(b+1)}{n} +\frac{2a(a-1)b}{n(n-1)} + \frac{a(a-1)b(b-1)}{n(n-1)} \\ &&&= \frac{a(b+1)}{n} +\frac{a(a-1)b(b+1)}{n(n-1)} \\ && \var[S] &= \E[S^2] - \left ( \E[S] \right)^2 \\ &&&= \frac{a(b+1)}{n} + \frac{a(a-1)b(b+1)}{n(n-1)} - \frac{a^2(b+1)^2}{n^2} \\ &&&= \frac{a(b+1) \left (n(n-1) + (a-1)b n -a(b+1)(n-1) \right)}{n^2(n-1)} \\ &&&= \frac{a(b+1) \left ( (n-a)(n-b-1) \right)}{n^2(n-1)} \\ &&&= \frac{a(b+1) \left ( b(a-1) \right)}{n^2(n-1)} \\ \end{align*}
2010 Paper 3 Q13
D: 1700.0 B: 1516.0

In this question, \({\rm Corr}(U,V)\) denotes the product moment correlation coefficient between the random variables \(U\) and \(V\), defined by \[ \mathrm{Corr}(U,V) \equiv \frac{\mathrm{Cov}(U,V)}{\sqrt{\var(U)\var(V)}}\,. \] The independent random variables \(Z_1\), \(Z_2\) and \(Z_3\) each have expectation 0 and variance 1. What is the value of \(\mathrm{Corr} (Z_1,Z_2)\)? Let \(Y_1 = Z_1\) and let \[ Y_2 = \rho _{12} Z_1 + (1 - {\rho_{12}^2})^{ \frac12} Z_ 2\,, \] where \(\rho_{12}\) is a given constant with $-1<\rho _{12}<1$. Find \(\E(Y_2)\), \(\var(Y_2)\) and \(\mathrm{Corr}(Y_1, Y_2)\). Now let \(Y_3 = aZ_1 + bZ_2 + cZ_3\), where \(a\), \(b\) and \(c\) are real constants and \(c\ge0\). Given that \(\E(Y_3) = 0\), \(\var(Y_3) = 1\), \( \mathrm{Corr}(Y_1, Y_3) =\rho^{{2}}_{13} \) and \( \mathrm{Corr}(Y_2, Y_3)= \rho^{{2}} _{23}\), express \(a\), \(b\) and \(c\) in terms of \(\rho^{2} _{23}\), \(\rho^{2}_{13}\) and \(\rho^{2} _{12}\). Given constants \(\mu_i\) and \(\sigma_i\), for \(i=1\), \(2\) and \(3\), give expressions in terms of the \(Y_i\) for random variables \(X_i\) such that \(\E(X_i) = \mu_i\), \(\var(X_i) = \sigma_ i^2\) and \(\mathrm{Corr}(X_i,X_j) = \rho_{ij}\).

Show Solution
\begin{align*} \mathrm{Corr} (Z_1,Z_2) &= \frac{\mathrm{Cov}(Z_1,Z_2)}{\sqrt{\var(Z_1)\var(Z_2)}} \\ &= \frac{\mathbb{E}(Z_1 Z_2)}{\sqrt{1 \cdot 1}} \\ &= \frac{\mathbb{E}(Z_1)\mathbb{E}(Z_2)}{\sqrt{1 \cdot 1}} \\ &= \frac{0}{1} \\ &= 0 \end{align*} \begin{align*} && \mathbb{E}(Y_2) &= \mathbb{E}(\rho_{12} Z_1 + (1 - {\rho_{12}^2})^{ \frac12} Z_ 2) \\ &&&= \mathbb{E}(\rho_{12} Z_1) + \mathbb{E}( (1 - {\rho_{12}^2})^{ \frac12} Z_ 2) \\ &&&= \rho_{12}\mathbb{E}( Z_1) + (1 - {\rho_{12}^2})^{ \frac12}\mathbb{E}( Z_ 2) \\ &&&= 0\\ \\ && \textrm{Var}(Y_2) &= \textrm{Var}(\rho _{12} Z_1 + (1 - {\rho_{12}^2})^{ \frac12} Z_ 2) \\ &&&= \textrm{Var}(\rho_{12} Z_1)+\textrm{Cov}(\rho_{12} Z_1,(1 - {\rho_{12}^2})^{ \frac12} Z_ 2 ) + \textrm{Var}((1 - {\rho_{12}^2})^{ \frac12} Z_ 2) \\ &&&= \rho_{12}^2\textrm{Var}( Z_1)+\rho_{12} (1 - {\rho_{12}^2})^{ \frac12} \textrm{Cov}(Z_1, Z_ 2 ) + (1 - {\rho_{12}^2})\textrm{Var}(Z_ 2) \\ &&&= \rho_{12}^2 + (1-\rho_{12}^2) = 1 \\ \\ && \textrm{Cov}(Y_1, Y_2) &= \mathbb{E}((Y_1-0)(Y_2-0)) \\ &&&= \mathbb{E}(Z_1 \cdot (\rho _{12} Z_1 + (1 - {\rho_{12}^2})^{ \frac12} Z_ 2)) \\ &&&= \rho_{12} \mathbb{E}(Z_1^2) + (1-\rho_{12}^2)^{\frac12}\mathbb{E}(Z_1, Z_2) \\ &&&= \rho_{12} \\ \Rightarrow && \textrm{Corr}(Y_1, Y_2) &= \frac{\textrm{Cov}(Y_1, Y_2)}{\sqrt{\textrm{Var}(Y_1)\textrm{Var}(Y_2)}} \\ &&&= \frac{\rho_{12}}{1 \cdot 1} = \rho_{12} \end{align*} Suppose \(Y_3 =aZ_1 +bZ_2+cZ_3\) with \(\mathbb{E}(Y_3) = 0\) (must be true), \(\textrm{Var}(Y_3) = 1 = a^2+b^2+c^2\) and \(\textrm{Corr}(Y_1, Y_3) = \rho_{13}, \textrm{Corr}(Y_2, Y_3) = \rho_{23}\). \begin{align*} && \textrm{Corr}(Y_1,Y_3) &= \textrm{Cov}(Y_1, Y_3) \\ &&&= \textrm{Cov}(Z_1, aZ_1 +bZ_2+cZ_3) \\ &&&= a \\ \Rightarrow && a &= \rho_{13} \\ \\ && \textrm{Corr}(Y_2,Y_3) &= \textrm{Cov}(Y_2, Y_3) \\ &&&= \textrm{Cov}(\rho_{12}Z_1+(1-\rho_{12}^2)^\frac12Z_2, \rho_{13}Z_1 +bZ_2+cZ_3) \\ &&&= \rho_{12}\rho_{13}+(1-\rho_{12}^2)^\frac12b \\ \Rightarrow && \rho_{23} &= \rho_{12}\rho_{13}+(1-\rho_{12}^2)^\frac12b \\ \Rightarrow && b &= \frac{\rho_{23}-\rho_{12}\rho_{13}}{(1-\rho_{12}^2)^\frac12} \\ && c &= \sqrt{1-\rho_{13}^2-\frac{(\rho_{23}-\rho_{12}\rho_{13})^2}{(1-\rho_{12}^2)}} \end{align*} Finally, let \(X_i = \mu_i + \sigma_i Y_i\)
2007 Paper 3 Q12
D: 1700.0 B: 1487.4

I choose a number from the integers \(1, 2, \ldots, (2n-1)\) and the outcome is the random variable \(N\). Calculate \( \E(N)\) and \(\E(N^2)\). I then repeat a certain experiment \(N\) times, the outcome of the \(i\)th experiment being the random variable \(X_i\) (\(1\le i \le N\)). For each \(i\), the random variable \(X_i\) has mean \(\mu\) and variance \(\sigma^2\), and \(X_i\) is independent of \(X_j\) for \(i\ne j\) and also independent of \(N\). The random variable \(Y\) is defined by \(Y= \sum\limits_{i=1}^NX_i\). Show that \(\E(Y)=n\mu\) and that \(\mathrm{Cov}(Y,N) = \frac13n(n-1)\mu\). Find \(\var(Y) \) in terms of \(n\), \(\sigma^2\) and \(\mu\).

Show Solution
\begin{align*} && \E[N] &= \sum_{i=1}^{2n-1} \frac{i}{2n-1} \\ &&&= \frac{2n(2n-1)}{2(2n-1)} = n\\ && \E[N^2] &= \sum_{i=1}^{2n-1} \frac{i^2}{2n-1} \\ &&&= \frac{(2n-1)(2n)(4n-1)}{6(2n-1)} \\ &&&= \frac{n(4n-1)}{3} \\ && \var[N] &= \frac{n(4n-1)}{3} - n^2 \\ &&&= \frac{n^2-n}{3} \end{align*} \begin{align*} && \E[Y] &= \E \left [ \E \left [ \sum_{i=1}^N X_i | N = k\right] \right]\\ &&&= \E \left[ N\mu \right] = n\mu \\ \\ && \mathrm{Cov}(Y,N) &= \mathbb{E}[XY] - \E[X]\E[Y] \\ &&&= \E \left [ \E \left [N \sum_{i=1}^N X_i | N = k\right] \right] - n^2 \mu \\ &&&= \E[N^2\mu] - n^2 \mu \\ &&&= \left ( \frac{n^2(4n-1)}{3} - n^2 \right) \mu \\ &&&= \frac{n^2-n}{3}\mu \\ \\ && \E[Y^2] &= \E \left [ \E \left [ \left ( \sum_{i=1}^N X_i \right) ^2\right ] \right] \\ &&&= \E \left [ \E \left [ \sum_{i=1}^N X_i ^2 + 2\sum_{i,j} X_iX_j\right ] \right] \\ &&&= \E \left [ \sum_{i=1}^N \left ( \E[X_i ^2] + 2\sum_{i,j} \E[X_i]\E[X_j]\right ) \right] \\ &&&= \E \left [ N(\sigma^2 + \mu^2) + (N^2-N)\mu^2\right] \\ &&&= n(\sigma^2+\mu^2) + \left ( \frac{n^2-n}{3}-n \right)\mu^2 \\ &&&= n\sigma^2 + \frac{n^2-n}{3} \mu^2 \\ \Rightarrow && \var[Y] &= n\sigma^2 + \frac{n^2-n}{3} \mu^2 - n^2\mu^2 \\ &&&= n\sigma^2 - \frac{2n^2+n}{3} \mu^2 \end{align*}
2006 Paper 3 Q14
D: 1700.0 B: 1516.0

For any random variables \(X_1\) and \(X_2\), state the relationship between \(\E(aX_1+bX_2)\) and \(\E(X_1)\) and \(\E(X_2)\), where \(a\) and \(b\) are constants. If \(X_1\) and \(X_2\) are independent, state the relationship between \(\E(X_1X_2)\) and \(\E(X_1)\) and \(\E(X_2)\). An industrial process produces rectangular plates. The length and the breadth of the plates are modelled by independent random variables \(X_1\) and \(X_2\) with non-zero means \(\mu_1\) and \(\mu_2\) and non-zero standard deviations \(\sigma_1\) and \(\sigma_2\), respectively. Using the results in the paragraph above, and without quoting a formula for \(\var(aX_1+bX_2)\), find the means and standard deviations of the perimeter \(P\) and area \(A\) of the plates. Show that \(P\) and \(A\) are not independent. The random variable \(Z\) is defined by \(Z=P-\alpha A\), where \(\alpha \) is a constant. Show that \(Z\) and \(A\) are not independent if \[ \alpha \ne \dfrac{2(\mu_1^{\vphantom2} \sigma_2^2 +\mu_2^{\vphantom2}\sigma_1^2)} { \mu_1^2 \sigma_2^2 +\mu_2^2\sigma_1^2 + \sigma_1^2\sigma_2^2 } \;. \] Given that \(X_1\) and \(X_2\) can each take values 1 and 3 only, and that they each take these values with probability \(\frac 12\), show that \(Z\) and \(A\) are not independent for any value of \(\alpha\).

Show Solution
\(\E(aX_1+bX_2) = a \E(X_1) + b\E(X_2)\) for any \(X_1, X_2\) \(\E(X_1X_2)=\E(X_1)\E(X_2)\). if \(X_1, X_2\) are independent. \begin{align*} && \E(P) &= \E(2(X_1+X_2)) = 2(\E[X_1]+\E[X_2]) \\ &&&= 2(\mu_1 + \mu_2) \\ && \var(P) &= \E[\left ( 2(X_1+X_2) \right)^2] - \E[2(X_1+X_2)]^2 \\ &&&= 4\E[X_1^2+2X_1X_2+X_2^2] -4(\mu_1 + \mu_2)^2 \\ &&&= 4(\mu_1^2 + \sigma_1^2 + 2\mu_1\mu_2 + \mu_2^2 + \sigma_2^2) - 4(\mu_1 + \mu_2)^2 \\ &&&= 4(\sigma_1^2+\sigma_2^2) \\ && \textrm{SD}(P) &= 2 \sqrt{\sigma_1^2+\sigma_2^2}\\ \\ && \E(A) &= \E[X_1X_2] = \E[X_1]\E[X_2] \\ &&&= \mu_1\mu_2 \\ && \var(A) &= \E[(X_1X_2)^2] - (\mu_1\mu_2)^2 \\ &&&= (\mu_1^2+\sigma_1^2)(\mu_2^2+\sigma_2^2) - (\mu_1\mu_2)^2\\ &&&= \mu_1^2 \sigma_2^2 + \mu_2^2 \sigma_1^2 + \sigma_1^2 \sigma_2^2\\ && \textrm{SD}(A) &= \sqrt{\mu_1^2 \sigma_2^2 + \mu_2^2 \sigma_1^2 + \sigma_1^2 \sigma_2^2} \end{align*} \begin{align*} \E[PA] &= \E[2(X_1+X_2)X_1X_2] \\ &= 2\E[X_1^2X_2] + 2\E[X_1X_2^2]\\ &= 2(\mu_1^2 + \sigma_1^2)\mu_2 + 2\mu_1 (\mu_2^2+\sigma_2^2)\\ &\neq 2(\mu_1 + \mu_2)\mu_1\mu_2 \\ &= \E[P]\E[A] \end{align*} \begin{align*} && \E[Z] &= \E[P] - \alpha \E[A] \\ &&&= 2(\mu_1+\mu_2) - \alpha \mu_1 \mu_2 \\ \\ && \E[ZA] &= \E[PA - \alpha A^2] \\ &&&= 2(\mu_1^2 + \sigma_1^2)\mu_2 + 2\mu_1 (\mu_2^2+\sigma_2^2) - \alpha \E[A^2] \\ &&&= 2(\mu_1^2 + \sigma_1^2)\mu_2 + 2\mu_1 (\mu_2^2+\sigma_2^2) - \alpha \E[X_1^2]\E[X_2^2] \\ &&&= 2(\mu_1^2 + \sigma_1^2)\mu_2 + 2\mu_1 (\mu_2^2+\sigma_2^2) - \alpha (\mu_1^2+\sigma_1^2)(\mu_2^2+\sigma_2^2) \\ \text{if ind.} && \E[Z]\E[A] &= \E[ZA]\\ && (2(\mu_1+\mu_2) - \alpha \mu_1 \mu_2) \mu_1\mu_2 &= 2(\mu_1^2 + \sigma_1^2)\mu_2 + 2\mu_1 (\mu_2^2+\sigma_2^2) - \alpha (\mu_1^2+\sigma_1^2)(\mu_2^2+\sigma_2^2) \\ \Rightarrow && 2(\mu_1^2\mu_2+\mu_1\mu_2^2) - \alpha \mu_1^2\mu_2^2 &= 2(\mu_1^2\mu_2+\mu_1\mu_2^2) + 2\sigma_1^2\mu_2 + 2\sigma_2^2\mu_1 - \alpha (\mu_1^2+\sigma_1^2)(\mu_2^2+\sigma_2^2) \\ \Rightarrow && \alpha ((\mu_1^2+\sigma_1^2)(\mu_2^2+\sigma_2^2) - \mu_1^2\mu_2^2) &= 2(\sigma_1^2\mu_2 + \sigma_2^2\mu_1) \\ \Rightarrow && \alpha &= \frac{ 2(\sigma_1^2\mu_2 + \sigma_2^2\mu_1) }{\mu_1^2 \sigma_2^2 + \mu_2^2 \sigma_1^2 + \sigma_1^2 \sigma_2^2} \end{align*} Therefore if they are not independent if \(\alpha \neq \) the expression. \begin{array}{c|c|c|c|c|c} & X_1 & X_2 & A & P & Z \\ \hline 0.25 & 1 & 1 & 1 & 4 & 4-\alpha \\ 0.25 & 1 & 3 & 3 & 8 & 8-3\alpha \\ 0.25 & 3 & 1 & 3 & 8 & 8-3\alpha \\ 0.25 & 3 & 3 & 9 & 12 & 12-9\alpha \\ \end{array} If \(\mathbb{P}(A = 1, Z = 4-\alpha) = \mathbb{P}(A = 1)\mathbb{P}(Z = 4-\alpha)\) then \(\mathbb{P}(Z = 4-\alpha) = 1\), but that mean \(4-\alpha = 8-3\alpha = 12-9\alpha\) which is not a consistent set of equations as the first two are solved by \(\alpha = 2\) and the second by \(\alpha = \frac23\)
2005 Paper 3 Q12
D: 1700.0 B: 1516.0

Five independent timers time a runner as she runs four laps of a track. Four of the timers measure the individual lap times, the results of the measurements being the random variables \(T_1\) to \(T_4\), each of which has variance \(\sigma^2\) and expectation equal to the true time for the lap. The fifth timer measures the total time for the race, the result of the measurement being the random variable \(T\) which has variance \(\sigma^2\) and expectation equal to the true race time (which is equal to the sum of the four true lap times). Find a random variable \(X\) of the form \(aT+b(T_1+T_2+T_3+T_4)\), where \(a\) and \(b\) are constants independent of the true lap times, with the two properties:

  1. whatever the true lap times, the expectation of \(X\) is equal to the true race time;
  2. the variance of \(X\) is as small as possible.
Find also a random variable \(Y\) of the form \(cT+d(T_1+T_2+T_3+T_4)\), where \(c\) and \(d\) are constants independent of the true lap times, with the property that, whatever the true lap times, the expectation of \(Y^2\) is equal to \(\sigma^2\). In one particular race, \(T\) takes the value 220 seconds and \((T_1 + T_2 + T_3 + T_4)\) takes the value \(220.5\) seconds. Use the random variables \(X\) and \(Y\) to estimate an interval in which the true race time lies.

Show Solution
Let the expected total time for the race be \(\mu\). Let \(X = aT + b(T_1 + T_2+T_3+T_4)\) then \(\E[X] = a\E[T] + b\E[T_1+\cdots+T_4] = a \mu + b \mu = (a+b)\mu\). So \(a+b=1\). \begin{align*} && \var[X] &= a^2\var[T] + b^2(\var[T_1] + \var[T_2] + \var[T_3] + \var[T_4]) \\ &&&= a^2\sigma^2 + 4b^2 \sigma^2 \\ &&& = \sigma^2 (a^2 + 4(1-a)^2 ) \\ &&&= \sigma^2 (5a^2 - 8a + 4) \\ &&&= \sigma^2 \left ( 5 \left ( a - \frac45 \right)^2 - \frac{16}{5}+4 \right)\\ &&&= \sigma^2 \left ( 5 \left ( a - \frac45 \right)^2 + \frac{4}{5}\right) \end{align*} Therefore variance is minimised when \(a = \frac45, b = \frac15\). Let \(Y = cT + d(T_1 + T_2+T_3+T_4)\) then \begin{align*} && \E[Y^2] &= \E \left [c^2T^2 + 2cd T(T_1+T_2+T_3+T_4) + d^2(T_1+T_2+T_3+T_4)^2 \right] \\ &&&= c^2 (\mu^2 + \sigma^2) + 2cd \mu^2 + d^2 (\var[T_1 + \cdots + T_4] + \mu^2) \\ &&&= c^2(\mu^2+\sigma^2) + 2cd \mu^2 + d^2(4\sigma^2 + \mu^2) \\ &&&= (c^2 + 2cd + d^2) \mu^2 + (c^2+4d^2) \sigma^2 \\ &&&= (c+d)^2 \mu^2 + (c^2+4d^2) \sigma^2 \\ \\ \Rightarrow && d &= -c \\ && 1 &= c^2 + 4d^2 \\ \Rightarrow && c &= \pm \frac{1}{\sqrt5} \\ && d &= \mp \frac{1}{\sqrt5} \end{align*} Given our results, our best estimate for \(\mu\) is \(\frac45 \cdot 220 + \frac15 220.5 = 220.1\). Our estimate for \(\sigma^2 = \left( \frac{1}{\sqrt{5}}(220.5-220) \right)^2 = \frac{1}{20}\). Note that \(\var[X] = \frac45\sigma^2 \approx \frac{1}{25}\) so we are looking at an interval \((220.1 - 0.4, 220.1 + 0.4) = (219.7, 220.5)\) using an interval of two standard errors.
2004 Paper 3 Q12
D: 1700.0 B: 1500.0

A team of \(m\) players, numbered from \(1\) to \(m\), puts on a set of a \(m\) shirts, similarly numbered from \(1\) to \(m\). The players change in a hurry, so that the shirts are assigned to them randomly, one to each player. Let \(C_i\) be the random variable that takes the value \(1\) if player \(i\) is wearing shirt \(i\), and 0 otherwise. Show that \(\mathrm{E}\left(C_1\right)={1 \over m}\) and find \(\var \left(C_1\right)\) and \(\mathrm{Cov}\left(C_1 \, , \; C_2 \right) \,\). Let \(\, N = C_1 + C_2 + \cdots + C_m \,\) be the random variable whose value is the number of players who are wearing the correct shirt. Show that \(\mathrm{E}\left(N\right)= \var \left(N\right) = 1 \,\). Explain why a Normal approximation to \(N\) is not likely to be appropriate for any \(m\), but that a Poisson approximation might be reasonable. In the case \(m = 4\), find, by listing equally likely possibilities or otherwise, the probability that no player is wearing the correct shirt and verify that an appropriate Poisson approximation to \(N\) gives this probability with a relative error of about \(2\%\). [Use \(\e \approx 2\frac{72}{100} \,\).]

Show Solution
There are \(m!\) different ways of assigning the shirts, and in \((m-1)!\) of them player \(1\) gets their own shirt, ie \(\mathbb{E}(C_1) = \mathbb{P}(\text{player }1\text{ gets own shirt}) = \frac{(m-1)!}{m!} = \frac{1}{m}\). \(\var(C_1) = \mathbb{E}(C_1^2) - [\mathbb{E}(C_1)]^2 = \frac{1}{m} - \frac{1}{m^2} = \frac{m-1}{m^2}\). If we have two players, there are \((m-2)!\) ways they both get their own shirts, therefore \(\textrm{Cov}(C_1,C_2) = \mathbb{E}(C_1C_2) - \mathbb{E}(C_1)\mathbb{E}(C_2) = \frac{(m-2)!}{m!} - \frac{1}{m^2} = \frac{1}{m(m-1)} - \frac{1}{m^2} = \frac{m-m+1}{m^2(m-1)} = \frac{1}{m^2(m-1)}\). \begin{align*} \mathbb{E}(N) &= \mathbb{E}(C_1 + C_2 + \cdots + C_m) \\ &= \mathbb{E}(C_1) + \mathbb{E}(C_2) + \cdots + \mathbb{E}(C_m) \\ &= \frac{1}{m} + \frac{1}{m} +\cdots+ \frac1m \\ &= 1 \\ \\ \var(N) &= \sum_{r=1}^m \var(C_r) + 2\sum_{r=1}^{m-1} \sum_{s=2}^{m} \textrm{Cov}(C_r,C_s) \\ &= m \frac{m-1}{m^2} + 2 \frac{m(m-1)}{2}\frac{1}{m^2(m-1)} \\ &=\frac{m-1}{m} + \frac{1}{m} \\ &= 1 \end{align*} If we were to take a normal approximation, we would want to take \(N(1,1)\), but this would say things like \(-1\) is as likely as \(3\) shirts being correct, which is clearly a bad model. A Poisson is much more likely to be a sensible model as they have the same mean and variance as the parameter, and if \(m\) is large, the covariance between shirts is going to be very small, so it will appear similar to random events occurring. We can have \begin{align*} BADC \\ BCDA \\ BDAC \\ CADB \\ CDAB\\ CDBA \\ DABC\\ DCAB \\ DCBA \end{align*} Ie \(\frac{9}{24}\) ways to have no player wearing their own shirt with \(4\) players. \(Po(1)\) would say this probability is \(e^{-1}\), giving a relative error of: \begin{align*} \frac{e^{-1}-\frac{9}{24}}{\frac9{24}} &\approx \frac{\frac{100}{272} - \frac{9}{24}}{\frac9{24}} \\ &= -\frac{1}{51} \\ &\approx -2\% \end{align*}
2002 Paper 3 Q14
D: 1700.0 B: 1500.0

Prove that, for any two discrete random variables \(X\) and \(Y\), \[ \mathrm{Var} \left(X + Y \right) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2 \, \mathrm{Cov}(X,Y), \] where \(\mathrm{Var}(X)\) is the variance of \(X\) and \(\mathrm{Cov}(X,Y)\) is the covariance of \(X\) and \(Y\). When a Grandmaster plays a sequence of \(m\) games of chess, she is, independently, equally likely to win, lose or draw each game. If the values of the random variables \(W\), \(L\) and \(D\) are the numbers of her wins, losses and draws respectively, justify briefly the following claims:

  1. \(W + L + D\) has variance \(0\,\);
  2. \(W + L\) has a binomial distribution.
Find the value of \(\displaystyle {\mathrm{Cov}(W,L) \over \sqrt{\mathrm{Var}(W) \mathrm{Var}(L)}}\;\).

Show Solution
\begin{align*} && \var[X+Y] &= \E\left [(X+Y-\E[X+Y])^2 \right] \\ &&&= \E \left [ (X - \E[X] + Y - \E[Y])^2 \right] \\ &&&= \E \left [(X - \E[X])^2 + (Y-\E[Y])^2 + 2(X-\E[X])(Y-\E[Y]) \right] \\ &&&= \E \left [(X - \E[X])^2 \right]+\E \left [(Y-\E[Y])^2 \right]+\E \left [2(X-\E[X])(Y-\E[Y]) \right] \\ &&&= \var[X] + \var[Y] + 2 \mathrm{Cov}(X,Y) \end{align*}
  1. \(W+L+D = m\) where \(m\) is the number of games, which has variance \(0\). Therefore \(W+L+D\) has variance \(0\).
  2. The probability of a decisive game is \(\frac23\) and \(W+L\) is the number of decisive games. Each game is independent so this meets the criteria for a binomial distribution.
Notice \(W+L \sim B(m, \tfrac23)\) and \(W, L, D \sim B(m, \tfrac13)\), in particular \(\var[W+L] = m \tfrac23 \tfrac13 = \tfrac29m\) and \(\var[W] = \var[D] = \var[D] = m \tfrac13 \tfrac13 = \tfrac29m\) \begin{align*} && \var[W+L] &= \var[W] + \var[L] + 2\mathrm{Cov}(W,L) \\ \Rightarrow && \mathrm{Cov}(W,L) &= -\tfrac19m \\ \Rightarrow && \frac{\mathrm{Cov}(W,L) }{\sqrt{\var[W]\var[L]}} &= -\frac12 \end{align*}
2000 Paper 3 Q14
D: 1700.0 B: 1500.0

The random variable \(X\) takes only the values \(x_1\) and \(x_2\) (where \( x_1 \not= x_2 \)), and the random variable \(Y\) takes only the values \(y_1\) and \(y_2\) (where \(y_1 \not= y_2\)). Their joint distribution is given by $$ \P ( X = x_1 , Y = y_1 ) = a \ ; \ \ \P ( X = x_1 , Y = y_2 ) = q - a \ ; \ \ \P ( X = x_2 , Y = y_1 ) = p - a \ . $$ Show that if \(\E(X Y) = \E(X)\E(Y)\) then $$ (a - p q ) ( x_1 - x_2 ) ( y_1 - y_2 ) = 0 . $$ Hence show that two random variables each taking only two distinct values are independent if \(\E(X Y) = \E(X) \E(Y)\). Give a joint distribution for two random variables \(A\) and \(B\), each taking the three values \(- 1\), \(0\) and \(1\) with probability \({1 \over 3}\), which have \(\E(A B) = \E( A)\E (B)\), but which are not independent.

Show Solution
\begin{align*} \mathbb{P}(X = x_1) &= a + q - a = q \\ \mathbb{P}(X = x_2) &= 1 - q \\ \mathbb{P}(Y = y_1) & = a + p - a = p \\ \mathbb{P}(Y = y_2) & = 1 - p \end{align*} \begin{align*} \mathbb{E}(X)\mathbb{E}(Y) &= \l qx_1 + (1-q)x_2 \r \l p y_1 + (1-p)y_2\r \\ &= qpx_1y_1 + q(1-p)x_1y_2 + (1-q)px_2y_1 + (1-q)(1-p)x_2y_2 \\ \mathbb{E}(XY) &= ax_1y_1 + (q-a)x_1y_2 + (p-a)x_2y_1 + (1 + a - p - q)x_2y_2 &= \end{align*} Therefore \(\mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)\) is a degree 2 polynomial in the \(x_i, y_i\). If \(x_1 = x_2\) then we have: \begin{align*} \mathbb{E}(X)\mathbb{E}(Y) &=x_1 \l p y_1 + (1-p)y_2\r \\ \mathbb{E}(XY) &= x_1(ay_1 + (q-a)y_2 + (p-a)y_1 + (1 + a - p - q)y_2) \\ &= x_1 (py_1 + (1-p)y_2) \end{align*} Therefore \(x_1 - x_2\) is a root and by symmetry \(y_1 - y_2\) is a root. Therefore it remains to check the coefficient of \(x_1y_1\) which is \(a - pq\) to complete the factorisation. For any two random variables taking two distinct values, we can find \(a, q, p\) satisfying the relations above. We also note that \(X\) and \(Y\) are independent if \(\mathbb{P}(X = x_i, Y = y_i) = \mathbb{P}(X = x_i)\mathbb{P}(Y = y_i)\). Since \(x_1 \neq x_2\) and \(y_1 \neq y_2\) and \(\E(A B) = \E( A)\E (B) \Rightarrow a = pq\). But if \(a = pq\), we have \(\mathbb{P}(X = x_1, Y = y_1) = \mathbb{P}(X = x_1)\mathbb{P}(Y = y_1)\) and all the other relations drop out similarly. Consider \begin{align*} \mathbb{P}(A = -1, B = 1) &= \frac{1}{6} \\ \mathbb{P}(A = -1, B = -1) &= \frac{1}{6} \\ \mathbb{P}(A = 0, B = 0) &= \frac{1}{3} \\ \mathbb{P}(A = 1, B = -1) &= \frac{1}{6} \\ \mathbb{P}(A = -1, B = -1) &= \frac{1}{6} \end{align*}
1997 Paper 3 Q14
D: 1700.0 B: 1516.0

An industrial process produces rectangular plates of mean length \(\mu_{1}\) and mean breadth \(\mu_{2}\). The length and breadth vary independently with non-zero standard deviations \(\sigma_{1}\) and \(\sigma_{2}\) respectively. Find the means and standard deviations of the perimeter and of the area of the plates. Show that the perimeter and area are not independent.

Show Solution
Let \(L \sim N(\mu_1, \sigma_1^2)\), \(B \sim N(\mu_2, \sigma_2)^2\), so \begin{align*} && \mathbb{E}(\text{perimeter}) &= \E(2(L+B)) \\ &&&= 2\E[L]+2\E[B] \\ &&&= 2(\mu_1+\mu_2) \\ &&\var[\text{perimeter}] &= \E\left [ (2(L+B))^2 \right] - \left ( \E[2(L+B)] \right)^2 \\ &&&= 4\E[L^2+2LB+B^2] - 4(\mu_1+\mu_2)^2 \\ &&&= 4(\sigma_1^2+\mu_1^2+2\mu_1\mu_2+\sigma_2^2+\mu_2^2) - 4(\mu_1+\mu_2)^2\\ &&&= 4(\sigma_1^2+\sigma_2^2) \\ &&\text{sd}[\text{perimeter}] &= 2\sqrt{\sigma_1^2+\sigma_2^2} \\ \\ && \E[\text{area}] &= \E[LB] \\ &&&= \E[L]\E[B] \\ &&&= \mu_1\mu_2 \\ && \var[\text{area}] &= \E[(LB)^2] - \left (\E[LB] \right)^2 \\ &&&= \E[L^2]\E[B^2]-\mu_1^2\mu_2^2 \\ &&&= (\mu_1^2+\sigma_1^2)(\mu_2^2+\sigma_2^2) -\mu_1^2\mu_2^2 \\ &&&= \sigma_1^2\mu_2^2 + \sigma_2^2\mu_1^2 + \sigma_1^2\sigma_2^2\\ && \text{sd}(\text{area}) &= \sqrt{\sigma_1^2\mu_2^2 + \sigma_2^2\mu_1^2 + \sigma_1^2\sigma_2^2} \\ \\ && \E[\text{perimeter} \cdot \text{area}] &= \E[2(L+B)LB] \\ &&&= 2\E[L^2]\E[B] + 2\E[L]\E[B^2] \\ &&&= 2(\sigma_1^2+\mu_1^2)\mu_2 + 2(\sigma_2^2+\mu_2^2)\mu_1 \\ && \E[\text{perimeter}] \E[\text{area}] &= 2(\mu_1+\mu_2) \cdot \mu_1\mu_2 \end{align*} Since the latter does not depend on \(\sigma_i\) but the former does they cannot be equal in general, therefore they cannot be independent. [See also STEP 2006 Paper 3 Q14]
1995 Paper 3 Q12
D: 1700.0 B: 1484.0

The random variables \(X\) and \(Y\) are independently normally distributed with means 0 and variances 1. Show that the joint probability density function for \((X,Y)\) is \[ \mathrm{f}(x,y)=\frac{1}{2\pi}\mathrm{e}^{-\frac{1}{2}(x^{2}+y^{2})}\qquad-\infty < x < \infty,-\infty < y < \infty. \] If \((x,y)\) are the coordinates, referred to rectangular axes, of a point in the plane, explain what is meant by saying that this density is radially symmetrical. The random variables \(U\) and \(V\) have a joint probability density function which is radially symmetrical (in the above sense). By considering the straight line with equation \(U=kV,\) or otherwise, show that \[ \mathrm{P}\left(\frac{U}{V} < k\right)=2\mathrm{P}(U < kV,V > 0). \] Hence, or otherwise, show that the probability density function of \(U/V\) is \[ \mathrm{g}(k)=\frac{1}{\pi(1+k^{2})}\qquad-\infty < k < \infty. \]

1991 Paper 3 Q16
D: 1700.0 B: 1504.3

The random variables \(X\) and \(Y\) take integer values \(x\) and \(y\) respectively which are restricted by \(x\geqslant1,\) \(y\geqslant1\) and \(2x+y\leqslant2a\) where \(a\) is an integer greater than 1. The joint probability is given by \[ \mathrm{P}(X=x,Y=y)=c(2x+y), \] where \(c\) is a positive constant, within this region and zero elsewhere. Obtain, in terms of \(x,c\) and \(a,\) the marginal probability \(\mathrm{P}(X=x)\) and show that \[ c=\frac{6}{a(a-1)(8a+5)}. \] Show that when \(y\) is an even number the marginal probability \(\mathrm{P}(Y=y)\) is \[ \frac{3(2a-y)(2a+2+y)}{2a(a-1)(8a+5)} \] and find the corresponding expression when \(y\) is off. Evaluate \(\mathrm{E}(Y)\) in terms of \(a\).

1990 Paper 3 Q16
D: 1700.0 B: 1484.0

  1. A rod of unit length is cut into pieces of length \(X\) and \(1-X\); the latter is then cut in half. The random variable \(X\) is uniformly distributed over \([0,1].\) For some values of \(X\) a triangle can be formed from the three pieces of the rod. Show that the conditional probability that, if a triangle can be formed, it will be obtuse-angled is \(3-2\sqrt{2.}\)
  2. The bivariate distribution of the random variables \(X\) and \(Y\) is uniform over the triangle with vertices \((1,0),(1,1)\) and \((0,1).\) A pair of values \(x,y\) is chosen at random from this distribution and a (perhaps degenerate) triangle \(ABC\) is constructed with \(BC=x\) and \(CA=y\) and \(AB=2-x-y.\) Show that the construction is always possible and that \(\angle ABC\) is obtuse if and only if \[ y>\frac{x^{2}-2x+2}{2-x}. \] Deduce that the probability that \(\angle ABC\) is obtuse is \(3-4\ln2.\)

Show Solution
  1. TikZ diagram
    The construction is possible if \(x + y > 2-x-y \Rightarrow x+y > 1\) (which is as the triangle is above the diagonal line), and \(x + (2-x-y) > y \Rightarrow 1 > y\) (true as the triangle is below the horizontal line) and \(y + (2-x-y) > x \Rightarrow 1 > x\) (true as the triangle is left of the vertical arrow). By the cosine rule: \begin{align*} && y^2 &= x^2 + (2-x-y)^2 - 2 x (2-x-y) \cos \angle ABC \\ \Rightarrow && \cos \angle ABC &= \frac{x^2+(2-x-y)^2 - y^2}{2x(2-x-y)} \\ &&&= \frac{4+2x^2-4x-4y+2xy}{2x(2-x-y)} \\ \underbrace{\Rightarrow}_{\cos \angle ABC < 0} && 0 &> 4+2x^2-4x-4y+2xy \\ \Rightarrow && 0 &> 2x^2-4x+4 - 2(x-2)y \\ \Rightarrow && y &> \frac{x^2-2x+2}{2-x} \\ &&&= -x + \frac{2}{2-x} \end{align*}
    TikZ diagram
    Therefore the area we want is: \begin{align*} A &= 1 - \int_0^1 \left ( -x + \frac{2}{2-x} \right)\d x \\ &= 1 - \left [-\frac12 x^2 - 2 \ln(2-x) \right]_0^1 \\ &= 1 + \frac12 -2 \ln 2 \\ &= \frac32 - 2 \ln 2 \end{align*} Therefore the relative area is: \(\frac{\frac32 - 2 \ln 2}{1/2} = 3 - 4 \ln 2\)