Search⌘ K
AI Features

Regularization

Explore regularization techniques that reduce overfitting by constraining model parameters using shrinkage methods. Understand L1 and L2 norms, their effects on feature weights, and how the regularization parameter controls model complexity. Gain practical Python skills to apply these concepts for better supervised learning models.

Regularization is a technique to reduce the variance of a model. One such way is restricting the parameters to a subset of the parameter space. Reduction in variance turns out to be a prevention of overfitting.

Why not choose a simple model?

Starting with a model that’s too simple and gradually increasing its complexity by monitoring its performance on the testing data is one solution. Regularization does the reverse, that is, starting with a complex model and decreasing its complexity.

The answer to this question of why not choose a simple model has to do more with implementation than theory. Regularization is more systematically implementable compared to increasing the model complexity gradually. Furthermore, different regularization methods offer different ways to reduce the variance of the model, where one way might be better than the other for a task at hand.

Let’s explore one approach to regularization called the shrinkage method, which “shrinks” model parameters to reduce overfitting.

Shrinkage method

Shrinkage-based regularization keeps model parameters (weights) close to zero. This reduces the impact of any single parameter and prevents the model from fitting the noise in data.

Instead of setting exact upper/lower limits for each parameter, we can combine parameter shrinkage and loss minimization into a single goal:

minw{L(fw(x),y)+R(w)}\min_{\bold w}\{L(f_{\bold w}(\bold x),\bold y) + R(\bold w)\}

  • L(fw(x),y)=L(f_{\mathbf{w}}(\mathbf{x}), \mathbf{y}) = loss (how well the model fits the data)
  • R(w)=R(\bold w) = shrinkage function (penalty for large weights)
  • w=\bold w = vector of model parameters

Shrinking the parameters helps prevent overfitting while still trying to fit the data well.

Note: The goal is to minimize the loss LL while shrinking the parameters w\bold w as much as possible.

Shrinkage functions

The regularization term R(w)R(\mathbf{w}) (the shrinkage function) is calculated using mathematical norms of the weight vector (w\mathbf{w}). The two most popular choices are the L2 norm and the L1 norm.

  • L2L2 norm of a vector w\mathbf{w}
...