Introduction to Deep Learning & Neural Networks/

...

Backpropagation Algorithm

Take a look at the mathematics of the backpropagation algorithm.

We'll cover the following...

Notations
Forward pass
Backward pass
The chain rule for the backward pass

Neural Networks (NN) are non-linear classifiers that can be formulated as a series of matrix multiplications. Just like linear classifiers, they can be trained using the same principles we followed before, namely the gradient descent algorithm. The difficulty arises in computing the gradients.

But first things first.

Let’s start with a straightforward example of a two-layered NN, with each layer containing just one neuron.

Notations

The superscript defines the layer that we are in.
$o^L$ denotes the activation of layer L.
$w^L$ is a scalar weight of the layer L.
$b^L$ is the bias term of layer L.
$C$ is the cost function, $t$ is our target class, and $f$ is the activation function.

Forward pass

Our lovely model would look something like this in a simple sketch:

We can write the output of a neuron at layer $L$ as:

o^L =f( w^{L}o^{L-1} +b^L)

To simplify things, let’s define:

z^L = w^{L}o^{L-1} +b^L

so that our basic equation will become:

o^L=f(z^l)

We also know that our loss function is:

C = (o^L - t)^2

This is the so-called forward pass. We take some input and pass it through the network. From the output of the network, we can compute the loss $C$ .

Backward pass

Backward pass is the process of adjusting the weights $w$ in all the layers to minimize the loss $C$ .

To adjust the weights based on the training example, we can use our known update rule:

w^{L}_{t} = w^{L}_{t-1} - \lambda * \frac{\partial C}{\partial w^L}

where $\lambda$ is the learning rate that scales down the gradient.

It should be clear by now that the only thing left to compute is the gradient $\frac{\partial C}{\partial w^L}$ ...

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

Backpropagation Algorithm

Notations

Forward pass

Backward pass