Backpropagation Algorithm
Explore the backpropagation algorithm that enables neural networks to learn by computing gradients of the loss function with respect to weights and biases. Understand the forward and backward passes, how the chain rule applies to gradient calculation, and how these updates optimize the network's performance through iterative training.
We'll cover the following...
Neural Networks (NN) are non-linear classifiers that can be formulated as a series of matrix multiplications. Just like linear classifiers, they can be trained using the same principles we followed before, namely the gradient descent algorithm. The difficulty arises in computing the gradients.
But first things first.
Let’s start with a straightforward example of a two-layered NN, with each layer containing just one neuron.
Notations
- The superscript defines the layer that we are in.
- denotes the activation of layer L.
- is a scalar weight of the layer L.
- is the bias term of layer L.
- is the cost function, is our target class, and is the activation function.
Forward pass
Our lovely model would look something like this in a simple sketch:
We can write the output of a neuron at layer as:
To simplify things, let’s define:
so that our basic equation will become:
We also know that our loss function is:
This is the so-called forward pass. We take some input and pass it through the network. From the output of the network, we can compute the loss .
Backward pass
Backward pass is the process of adjusting the weights in all the layers to minimize the loss .
To adjust the weights based on the training example, we can use our known update rule:
where is the learning rate that scales down the gradient.
It should be clear by now that the only thing left to compute is the gradient ...