Search⌘ K
AI Features

Skip Connections

Explore skip connections and their critical function in overcoming the vanishing gradient problem in convolutional neural networks. Understand how residual and densely connected architectures use skip connections via addition and concatenation to enhance training and feature propagation.

If you were trying to train a neural network back in 2014, you would have definitely observed the so-called vanishing gradient problem. In simple terms: you are behind the screen checking the training process of your network, and all you see is that the training loss stops decreasing but is still far away from the desired value. You check all your code lines to see if something was wrong all night and you find no clue.

The update rule and the vanishing gradient problem

Let’s revisit ourselves the update rule of gradient descent without momentum, given L to be the loss function and λ\lambda to be the learning rate:

wi=wi+Δwi,w_{i}' = w_{i} + \Delta w_{i} ,

where Δwi=λCΔwi\Delta w_{i} = - \lambda \frac{\partial C}{\partial \Delta w_{i}} ...