Regularization (Lasso, Ridge, and ElasticNet Regression)
Learn more about Regularization. Specifically, it helps us deal with overfitting problems in Machine Learning models.
Regularization
We use overfitting to describe when the model learning is performing well on the training dataset but fails to generalize on the unseen or test dataset. This condition is also mentioned because the model is suffering from high variance. Overfitting on the training data can be illustrated as:
$J(w) \approx 0$
In other words, our predicted values are so close to the actual values, that the cost goes to zero and the model has memorized everything.
How high variance (overfitting) can be reduced

The first strategy is to look for more training data so that the data has more variety in it.

Regularization, which will be the focus of this part of the lesson is also used to tackle overfitting.

Employ good Feature Selection techniques.

There are also some specific Deep Learning techniques for reducing the high variance.
Now, we will look into how various Regularizations are used to overcome overfitting.
Ridge Regression
The following steps demonstrate how the cost function is modified in Ridge Regression, sometimes called L2Regularization.
$J(w)$ = $\frac{1}{2m}[\sum_{i=1}^{m}(\hat{y}^iy^i)^2 + \lambda \sum_{j=1}^{n}w_j^2]$

In Ridge Regression, we minimize the above function.

$\lambda$ is called the regularization parameter.

Choosing too high of a $\lambda$ value can cause the parameters $(w_1, w_2 ...)$ to have a low value, resulting in underfitting (also called High Bias) because the model won’t perform well on the training dataset. Notice that the parameter $w_0$ is not included in this regularization procedure, meaning it’s value remains unaffected from the regularization.

Choosing too small of $\lambda$ values can cause the term $\lambda \sum_{j=1}^{n}w_j^2$ to have negligible effect on the parameters $(w_1, w_2, ..)$ and this will convert to Linear Regression itself. So, choosing the $\lambda$ parameter also comes in the hyperparameter optimization.
Ridge Regression in Scikit Learn
Ridge
class is used for making Ridge Regression model.
Get handson with 1200+ tech skills courses.