Minimum mean squared error


Suppose for random variables \(X, Y\) we wish to find the best estimator \(f\) of \(Y\) from \(X\) by mean squared error:

\(\displaystyle \min_f \mathbb{E} (Y - f (X))^2\)

The solution is \(f^{\star} (X) =\mathbb{E} [Y|X]\). To show this, first let \(e (X) =\mathbb{E} [Y|X] - f (X)\):

\begin{eqnarray*} \mathbb{E} (Y - f (X))^2 & = & \mathbb{E} (Y -\mathbb{E} [Y|X] + \overbrace{\mathbb{E} [Y|X] - f (X)}^{= e (X)})^2\\ & = & \mathbb{E} (Y -\mathbb{E} [Y|X])^2 +\mathbb{E} \{ e^2 (X) \}\\ & & + 2\mathbb{E} \{ (Y -\mathbb{E} [Y|X]) e (X) \} \end{eqnarray*}

We can show that the last term vanishes by tower property:

\begin{eqnarray*} \mathbb{E} \{ \overbrace{(Y -\mathbb{E} [Y|X]) e (X)}^{\ast} \} & = & \mathbb{E} \{ \overbrace{\mathbb{E} \{ (Y -\mathbb{E} [Y|X]) e (X) |X \}}^{=\mathbb{E} [\ast |X]} \}\\ & = & \mathbb{E} \{ e (X) \mathbb{E} \{ Y -\mathbb{E} [Y|X] |X \} \}\\ & = & \mathbb{E} \{ e (X) (\mathbb{E} [Y|X] -\mathbb{E} [Y|X]) \}\\ & = & 0 \end{eqnarray*}

Additionally, the first term is simply the definition of conditional variance

\(\displaystyle \operatorname{var} (X|Y) =\mathbb{E} (Y -\mathbb{E} [Y|X])^2\)

So we can rewrite the mean squared error:

\(\displaystyle \mathbb{E} (Y - f (X))^2 =\operatorname{var} (X|Y) +\mathbb{E} (\mathbb{E} [Y|X] - f (X))^2\)

The first term (conditional variance) is constant with respect to our choice of \(f\). The second term is always non-negative, and vanishes when we choose \(f^{\star} (X) =\mathbb{E} [Y|X]\). In this case our optimal mean squared error is simply \(\operatorname{var} (X|Y)\).