Search⌘ K
AI Features

Ridge and Lasso Regression

Understand the principles of ridge and lasso regression, two key regularization methods in supervised learning. This lesson guides you through the linear model, squared loss, and penalty functions. Discover how ridge shrinks coefficients evenly, while lasso can reduce some to zero for feature selection. Visualizations and code examples deepen your insight into their differences and practical applications.

In the previous lesson, we saw how regularization helps control overfitting by penalizing large weights and balancing the bias variance trade-off. We also introduced L1 and L2 penalties and discussed their general effects on model behavior. Now, we focus specifically on ridge and lasso regression and examine how these penalties change the solution. Using the same linear model and squared loss, we compare ridge and lasso through their objective functions and visualize their behavior using MSE contours. This geometric perspective helps explain why ridge shrinks all coefficients, while lasso can drive some coefficients exactly to zero.

Ridge and Lasso objectives

Both Ridge and Lasso regression are special forms of regularized linear regression. They use the simplest model type (linear model) and the standard way to measure error (squared loss), differing only in their regularization penalty.

The core model and loss function

Before introducing the penalty, we must define the model that makes a prediction and the loss function that measures the error.

Linear model (fwf_{\mathbf{w}})

A linear model assumes the output (y^i\hat{y}_i, the prediction) is a simple, weighted sum of the inputs (xix_i). The goal is to find the best set of weights (w\mathbf{w}) that connect the inputs to the output.

  • We have nn training examples, D={(xi,yi)1in}D = \{(\mathbf{x}_i, y_i) \mid 1 \le i \le n\}. Each input xi\mathbf{x}_i
...