Tweak the Learning Rate

Explore how the learning rate affects the accuracy of results.

Adjust the learning rate

A 95% performance score on the MNIST dataset with our first neural network, using only simple ideas and simple Python, is not bad at all. If we wanted to stop here, it would be entirely justified.

But let’s see if we can make some easy improvements.

The first improvement we can try is to adjust the learning rate. Previously, we set it at 0.30.3, without really experimenting with different values.

Let’s try doubling it to 0.60.6, to see if a boost will be helpful or harmful to the overall network learning. If we run the code, we get a performance score of 0.90470.9047. That’s worse than before. So, it looks like the larger learning rate leads to some bouncing around and overshooting during the gradient descent.

Let’s try again with a learning rate of 0.10.1. This time, the performance is an improvement at 0.95230.9523. It’s similar in performance to one listed on that website, which has 1000 hidden nodes. We’re doing pretty well with fewer!

What happens if we keep going and set the learning rate to an even smaller 0.010.01? The performance isn’t so good at 0.92410.9241. So, it seems having too small of a learning rate is damaging. This makes sense because we’re limiting the speed at which gradient descent happens, and we’re making the steps too small.

Performance graph against learning rate

The following image plots a graph of these results. It’s not a very scientific approach, because we should really do these experiments many times to reduce the effect of randomness and bad journeys down the gradient descent, but it is still useful to see the general idea that there is a sweet spot for learning rate.

Get hands-on with 1200+ tech skills courses.