The Development Cycle of Neural Networks

Learn about the complete development cycle of training neural networks.

The development (testing) cycle

To see where the testing hurdle is, we’ll tune our neural network with an iterative process. We‘ll perform the following steps:

  1. Tune the network’s hyperparameters.
  2. Train the network on the training set.
  3. Test the network on the test set.
  4. Repeat until we are happy with the network’s accuracy.

This process is similar to the ML equivalent of software development, so we can simply call it the development cycle.

We already went through a few iterations of development in the previous chapter, when we measured the network’s performance with different batch sizes. However, we overlooked a distressing fact that the development cycle violates the blind test rule. Here’s why:

During development, we tune the neural network’s hyperparameters while looking at the network’s accuracy on the test set. By doing that, we implicitly custom-tailor the hyperparameters to get a good result on that set. In the end, our hyperparameters are optimized for the test set and are unlikely to be equally good on never-before-seen production data. In a sense, our own brain violates the blind test rule by leaking information about the test examples into the network. We threw overfitting out of the door, and it sneaked back in through the window.

Unintended optimization is a sneaky issue. As long as we are aware of it, however, we can avoid it with a low-cost approach. Instead of two sets of examples, we can have three, one for training, a second for testing, and a third one that we can use during the development cycle. This third set is usually called the validation set. If we use the validation set during development, we can safely use the test set at the very end of the process to get a realistic estimate of the system’s accuracy.

Splitting the data

Let’s recap how this strategy works:

  1. The setup: We put the test set aside. We’ll not use it until the very end.

  2. The development cycle: We train the network on the training set as usual but we use the validation set to gauge its performance.

  3. The final test: After we tweak and tune our hyperparameters, we test the network on the test set, which gives us an objective idea of how it will perform in production.

The key idea is to put the test set under a stone, and forget about it until the very end of the process. We should not use the test set during the development process. As soon as we do, we’ll violate the blind test rule, and risk unrealistic measures of the system’s accuracy.

How many examples should we set aside for the validation and test sets? That depends on the specific problem. Some people recommend setting aside 20% of the examples for the validation set, and just as many for the test set. That’s called the 60/20/20 split.

However, in MNIST we have plenty of examples—70,000 in total. It feels like a waste to set aside almost 30,000 examples for testing. Instead, we can take the 10,000 examples from the current test set, and split them in two groups of 5,000—one for the validation set, and one for the new test set. Here’s the updated code:

Get hands-on with 1200+ tech skills courses.