Suppose we have input vectors \(\{ x_i \}_{i = 1}^n\) and corresponding scalar targets \(\{ y_i \}_{i = 1}^n\), where each input vector \(x_i \in \mathbb{R}^d\). We can arrange the inputs as a matrix \(X\) and the targets as a vector \(y\):
Assume the conditional distribution \(y_i |x_i, w\) is normal with mean \(x_i^T w\) and known variance \(\sigma^2\). Under i.i.d assumptions on the data the likelihood is:
We will also assume an isotropic gaussian prior on the weights with known variance \(s^2\):
We can use Bayes rule to obtain the posterior distribution over the weights:
It turns out that the posterior is actually normal, with mean and covariance we can solve for exactly, though the algebra is a bit messy. Instead of trying to find the whole posterior distribution, here we just want the MAP estimate, the single value of \(w\) that maximizes the log density:
We can expand the MAP objective:
Dropping the constant and multiplying the objective by \(- 2 \sigma^2\) gives us a minimization problem:
This objective is identical to L2-regularized least squares with regularization constant \(\lambda = \left( \frac{\sigma^2}{s^2} \right)\).