Regularization refers to training our model well enough that it can generalize over data it hasn’t seen before. Regularization is a common method used to reduce overfitting and improve the model’s performance for new inputs.
Regularization methods are techniques that seek to reduce overfitting (i.e., reduce generalization errors) by keeping network weights small. There are three very popular and efficient regularization techniques called L1, L2, and dropout.
In the case of L1 regularization (also knows as Lasso regression), we simply use another regularization term, Ω. This term is the sum of the absolute values of the weight parameters in a weight matrix. L1 encourages weights to 0.0 (if possible), which results in more sparse weights (more weights with values equal to 0.0). Hence, the cost function in L1 becomes:
L2 regularization offers more nuance by penalizing larger weights more severely, thus resulting in weights that are less sparse. The regularization term (Ω) is defined as the Euclidean Norm (or L2 norm) of the weight matrices and is the sum over all squared weight values of a weight matrix. The cost function in L2 becomes:
Dropout regularization involves a neuron of the neural network getting turned off during training with a probability of . This results in a simpler neural network since some neurons are not active at all.
A simpler version of the neural network results in less complexity, which can reduce overfitting. The deactivation of neurons with a certain probability () is applied at each forward propagation and weight update step.
View all Courses