L1 and L2 regularization

L1 and L2 are two of the most common methods to regularize a decision boundary. It is the technical name for the operation that we informally called smoothing out. L1 and L2 work similarly, and they have mostly similar effects. Once we get into advanced ML territory, we may want to look deeper into their relative merits — but for our purposes in this course, we follow a simple rule either pick randomly between L1 and L2 or try both and see which one works better.

Let’s see how L1 and L2 work.

How L1 and L2 work

L1 and L2 rely on the same idea. They add a regularization term to the neural network’s loss. For example, here’s the loss augmented by L1 regularization:

\large{ L_{\text{regularized}}=L_{\text{non-regularized}}+\lambda \sum{|w|}}

How Machine Learning Works

Our First Learning Program

Walking the Gradient

Hyperspace

A Discern Machine

Get Real

The Final Challenge

The Perceptron

Designing the Network

Building the Network

Training the Network

How Classifiers Work

Batchin’ Up

The Zen of Testing

Let’s Do Development

A Deeper Kind of Network

Diabetes Prediction Using Keras

Defeating Overfitting

Taming Deep Networks

Beyond Vanilla Networks

Into the Deep

Recognize Handwritten Digits Using a Deep Neural Network

Machine Learning Fundamentals

Regularize the Model

L1 and L2 regularization

How L1 and L2 work