Put Gradient Descent to the Test

Explore how gradient descent optimizes model parameters by minimizing loss through partial derivatives. Understand key challenges such as overshooting and local minima that affect convergence. This lesson demonstrates improvements in efficiency and precision in parameter updates and explains why mean squared error is preferred.

We'll cover the following...

The algorithm
When gradient descent fails
Mean squared error
Summary

Python 3.8

import numpy as np
# Calling the predict() function
def predict(X, w, b):
    return X * w + b
# Calculatin the loss
def loss(X, Y, w, b):
    return np.average((predict(X, w, b) - Y) ** 2)
# computing the derivative
def gradient(X, Y, w, b):
    w_gradient = 2 * np.average(X * (predict(X, w, b) - Y))
    b_gradient = 2 * np.average(predict(X, w, b) - Y)
    return (w_gradient, b_gradient)
# calling the training function for 20,000 iterations
def train(X, Y, iterations, lr):
    w = b = 0
    for i in range(iterations):
        if (i % 5000 == 0):
            print("Iteration %4d => Loss: %.10f" % (i, loss(X, Y, w, b)))
        w_gradient, b_gradient = gradient(X, Y, w, b)
        w -= w_gradient * lr
        b -= b_gradient * lr
    return w, b
# loading the data and then calling the desired functions
X, Y = np.loadtxt("pizza.txt", skiprows=1, unpack=True)
w, b = train(X, Y, iterations=20000, lr=0.001)
print("\nw=%.10f, b=%.10f" % (w, b))
print("Prediction: x=%d => y=%.2f" % (20, predict(20, w, b)))

1.How Machine Learning Works

2.Our First Learning Program

3.Walking the Gradient

4.Hyperspace

5.A Discern Machine

6.Get Real

7.The Final Challenge

8.The Perceptron

9.Designing the Network

10.Building the Network

11.Training the Network

12.How Classifiers Work

13.Batchin’ Up

14.The Zen of Testing

15.Let’s Do Development

16.A Deeper Kind of Network

Project

17.Defeating Overfitting

18.Taming Deep Networks

19.Beyond Vanilla Networks

20.Into the Deep

Project

Mock Interview

Put Gradient Descent to the Test

The algorithm