Search⌘ K

Regularization

Learn what regularization is and how it affects validation and training error.

Regularization is a technique to reduce the variance of a model. One such way is restricting the parameters to a subset of the parameter space. Reduction in variance turns out to be a prevention of overfitting.

Why not choose a simple model?

Starting with a model that’s too simple and gradually increasing its complexity by monitoring its performance on the testing data is one solution. Regularization does the reverse, that is, starting with a complex model and decreasing its complexity.

The answer to this question of why not choose a simple model has to do more with implementation than theory. Regularization is more systematically implementable compared to increasing the model complexity gradually. Furthermore, different regularization methods offer different ways to reduce the variance of the model, where one way might be better than the other for a task at hand.

Let’s explore one approach to regularization called the shrinkage method, which “shrinks” model parameters to reduce overfitting.

Shrinkage method

Shrinkage-based regularization keeps model parameters (weights) close to zero. This reduces the impact of any single parameter and prevents the model from fitting the noise in data.

Instead of setting exact upper/lower limits for each parameter, we can combine parameter shrinkage and loss minimization into a single goal:

minw{L(fw(x),y)+R(w)}\min_{\bold w}\{L(f_{\bold w}(\bold x),\bold y) + R(\bold w)\}

  • L(fw(x),y)=L(f_{\mathbf{w}}(\mathbf{x}), \mathbf{y}) = loss (how well the model fits the data)
  • R(w)=R(\bold w) = shrinkage function (penalty for large weights)
  • w=\bold w = vector of model parameters

Shrinking the parameters helps prevent overfitting while still trying to fit the data well.

Note: The goal is to minimize the loss LL while shrinking the parameters w\bold w as much as possible.

Shrinkage functions

The regularization term R(w)R(\mathbf{w}) (the shrinkage function) is calculated using mathematical norms of the weight vector (w\mathbf{w}). The two most popular choices are the L2 norm and the L1 norm.

  • L2L2 norm of a vector w\mathbf{w} is denoted as w2\|\mathbf{w}\|_2.
  • L1L1 norm is denoted as w1\|\mathbf{w}\|_1.

Let’s examine the L2 norm.

Implementation of L2 norm

The L2 norm measures the straight-line distance (Euclidean distance) of the weight vector from the origin.

Mathematically, given the weight vector w=[w0w1wn]\mathbf{w} = \begin{bmatrix} w_{0} \\ w_{1} \\ \vdots \\ w_{n} \end{bmatrix} ...