Suppose for random variables \(X, Y\) we wish to find the best estimator
\(f\) of \(Y\) from \(X\) by mean squared error:
\(\displaystyle \min_f \mathbb{E} (Y - f (X))^2\)
The solution is \(f^{\star} (X) =\mathbb{E} [Y|X]\). To show this, first
let \(e (X) =\mathbb{E} [Y|X] - f (X)\):
\begin{eqnarray*}
\mathbb{E} (Y - f (X))^2 & = & \mathbb{E} (Y
-\mathbb{E} [Y|X] +
\overbrace{\mathbb{E} [Y|X] - f (X)}^{= e
(X)})^2\\
& = & \mathbb{E} (Y -\mathbb{E} [Y|X])^2 +\mathbb{E} \{ e^2
(X) \}\\
& & + 2\mathbb{E} \{ (Y -\mathbb{E} [Y|X]) e (X)
\}
\end{eqnarray*}
We can show that the last term vanishes by tower property:
\begin{eqnarray*}
\mathbb{E} \{ \overbrace{(Y -\mathbb{E} [Y|X]) e
(X)}^{\ast} \} & = &
\mathbb{E} \{ \overbrace{\mathbb{E} \{ (Y
-\mathbb{E} [Y|X]) e (X) |X
\}}^{=\mathbb{E} [\ast |X]} \}\\
& = &
\mathbb{E} \{ e (X) \mathbb{E} \{ Y -\mathbb{E} [Y|X] |X \} \}\\
& = &
\mathbb{E} \{ e (X) (\mathbb{E} [Y|X] -\mathbb{E} [Y|X]) \}\\
& = &
0
\end{eqnarray*}
Additionally, the first term can be rewritten since:
\(\displaystyle \mathbb{E}\{\operatorname{var} (Y|X)\} = \mathbb{E} (Y -\mathbb{E} [Y|X])^2\)
So we can rewrite the mean squared error:
\(\displaystyle \mathbb{E} (Y - f (X))^2 =\mathbb{E}\{\operatorname{var} (Y|X)\}
+\mathbb{E} (\mathbb{E}
[Y|X] - f (X))^2\)
The first term (conditional variance) is constant with respect to our
choice of \(f\). The second term is always non-negative, and vanishes
when we choose \(f^{\star} (X) =\mathbb{E} [Y|X]\).