Tune Learning Rate and Batch Size

Learn what happens when we tweak the learning rate and batch size while training a neural network.

Tune the learning rate

We’ll use our old hyperparameter called lr. This hyperparameter has been with us since almost the beginning of this course. Chances are, we already tuned it, maybe by trying a few random values. It’s time to be more precise about lr tuning.

To understand the trade-off of different learning rates, let’s go back to the basics and visualize gradient descent. The following diagrams show a few steps of GD along a one-dimensional loss curve, with three different values of lr. The red cross marks the starting point, and the green cross marks the minimum:

