Introduction to Batch

Learn what a batch is and how we can implement it.

Mini-batch gradient descent

The style of gradient descent that we have used so far is also called batch gradient descent because it clusters all the training examples into one big batch, and calculates the gradient of the loss over the entire batch. A common alternative is called mini-batch gradient descent. It segments the training set into smaller batches, and then takes a step of gradient descent for each batch.

We might wonder how small batches help speed up training. Let’s implement a mini-batch GD and give it a test drive.

Implementing batches

In most cases, we should shuffle a dataset before splitting it into batches. That way, we’re sure each batch contains a nice mix of examples instead of having all the examples of a certain type clustered in the same batch. However, MNIST already comes pre-shuffled, so we can just take the training set and split it into batches straight away. This function does the job:

Get hands-on with 1200+ tech skills courses.