Understanding Deep Learning Applications in Rare Event Prediction/

...

Using Dropout to Combat Overfitting

Explore how dropout techniques enhance neural network models.

We'll cover the following...

What’s coadaptation?
What is dropout?
Dropout layer
- Loss
- f1-score and recall
How can we improve accuracy?

A major shortcoming of the baseline model was overfitting. Overfitting is commonly due to a phenomenon found in large models called coadaptation. This can be addressed with dropout. Both the coadaptation issue and its resolution with dropout are explained below.

What’s coadaptation?

If all the weights in a deep learning network are learned together, it‘s usual that some nodes have more predictive capability than others.

In such a scenario, because the network is trained iteratively, these powerful nodes start to suppress the weaker ones. These nodes usually constitute a fraction of all. However, over many iterations, only these powerful nodes are trained and the rest stop participating.

This phenomenon is called coadaptation. It’s difficult to prevent with the traditional $\mathcal {L_1}$ and $\mathcal {L_2}$ regularization. The reason is that they also regularize based on the predictive capability of the nodes. As a result, the traditional methods become close to deterministic in choosing and rejecting weights. And so, a strong node gets stronger, and the weak gets weaker.

A major fallout of coadaptation is expanding the neural network size does not help.

This had been a severe issue in deep learning for a long time. Then, in around 2012, the idea of dropout, a new regularization approach emerged.

Dropout resolved coadaptation, which naturally revolutionized deep learning. With dropout, deeper and broader networks were possible.

What is dropout?

Dropout changed the approach of learning weights. Instead of learning all the network weights collectively, dropout trains a subset of them in a batch training iteration.

Press + to interact

On the other hand, with dropout, only a subset of nodes is kept active during batch learning. The three images in the illustration above correspond to three different batch iterations. Half of the nodes are switched off in each batch iteration, while the remaining are learned. After iterating through all the batches, the weights are returned as the average of their batch-wise estimations.

This technique acts as network regularizatio.n However, familiarity with traditional methods might make dropout appear not as regularization. Yet, there are some commonalities.

Like ...

Getting Started

Rare Event Prediction

Multi-Layer Perceptrons (MLPs)

Long Short-Term Memory (LSTM) Networks

Convolutional Neural Networks (CNNs)

Autoencoders

Conclusion

Using Dropout to Combat Overfitting

What’s coadaptation?

What is dropout?