Search⌘ K
AI Features

Step 3 - Compute the Gradients

Explore how to calculate gradients for parameters in a linear regression model using partial derivatives and the chain rule. Understand the relationship between gradient magnitude and loss surface steepness, and see how small changes in parameters affect the loss. Gain insights into backpropagation and its role in training neural networks.

Introduction to gradients

A gradient is a partial derivative; why partial? Because one computes it with respect to (w.r.t.) a single parameter. Since we have two parameters, b and w, we must compute two partial derivatives.

A derivative tells you how much a given quantity changes when you slightly vary some other quantity. In our case, how much does our MSE loss change when we vary each one of our two parameters separately?

Gradient = how much the loss changes if ONE parameter changes a little bit!

The right-most part of the equations below is what you usually see in implementations of gradient descent for simple linear regression. In the intermediate step, you will be shown all elements that pop-up from the application of the chain rule, so you know how the final expression came to be. This can be seen below:

MSEb=MSEyi^yi^b=1ni=1n2(b+wxiyi)\dfrac{\partial MSE}{\partial b} = \dfrac{\partial MSE}{\partial \hat{y_i}} \cdot{\dfrac{\partial \hat{y_i}}{\partial b}} = \dfrac{1}{n} \sum_{i=1}^n{2(b + w x_i - y_i)} ...