Data Comes First

Get familiar with a well known dataset, MNIST.

About data

It is time to leap from simple pizza-related examples to image recognition. We are about to do something magical. Within a few lessons, we’ll have a program that classifies images.

In this chapter and the next, we’ll apply our binary classifier to MNIST. It is a database of handwritten digits. A few years ago, before ML systems could tackle more complex datasets, AI researchers used MNIST as a benchmark for their algorithms. This will be our first experience with computer vision, so let’s start with small steps. We’ll begin by recognizing a single MNIST digit and leave more general character recognition to the next chapter.

Before we feed data to our ML system, let’s get up close and personal with that data. This section tells us all we need to know about MNIST.

Getting to know MNIST

MNIST is a collection of labeled images that has been assembled specifically for supervised learning. Its name stands for “Modified NIST” because it’s a remix of earlier data from the National Institute of Standards and Technology. MNIST contains images of handwritten digits labeled with their numerical values. Here are a few random images, capped by their labels:

Get hands-on with 1200+ tech skills courses.