Our Own MNIST Library
Explore how to convert MNIST images and labels into input matrices for a binary classifier, focusing on recognizing the digit 5. Understand reshaping images into feature vectors, encoding labels for binary output, and developing a small library to load and preprocess this data effectively.
We'll cover the following...
Goal of the lesson
In the previous lessons, we built a binary classifier. Now we want to apply that program to MNIST.
The first step is to reshape MNIST’s images and labels into an input for our program. Let’s see how to do that.
Prepare the input matrices
Our binary classifier program expects its input formatted as two matrices: a set of examples and a set of labels . Let’s start with the matrix of examples .
is supposed to have one line per example and one column per input variable, plus a bias column full of 1s. (Remember the bias column? We talked about it in the Bye-bye, bias section)
To fit MNIST’s images to this format, we can reshape each image to a single line of pixels so that each pixel becomes an input variable. MNIST images are 28 by 28 pixels, so squashing them results in lines of 784 elements. And throw in the bias column, and that makes 785. So, X should look like a matrix of 60,000 lines (the number of examples) and 785 columns (one per pixel, plus the bias). Check out the following graph:
We just graduated from simple examples with three or four input variables to tens of thousands of examples and hundreds of input variables!
Now let’s look at , the matrix of labels. At first glance, it looks simpler than the matrix of images. It still has one line per example, but only one column, that contains the label. However, we have an additional difficulty here. So far, we have only built binary classifiers, which expect a label that’s either 0 or 1. By contrast, MNIST labels range from 0 to 9. How can we fit ten different values into either 0 or 1?
For now, we can work around that problem by narrowing our scope. Let’s start by recognizing one specific digit, the digit 5. This is a binary classification problem because a digit can belong to two classes: “not a 5” and “5.”
This means that we should convert all MNIST labels to 0s, except for 5s, which we should convert to 1s. So Our matrix looks like:
Now we know how to build and . Let’s turn this plan into code.
Build a library
The Internet overflows with libraries and code snippets that read MNIST data. But we are developers, so hey, let’s write one more!
In this section, we’ll introduce the code for a tiny library that loads those images and labels and reshapes them into the and that we just described.
Load images
Here is the code that loads MNIST images into two matrices: one for the training examples and one for the test examples:
Let’s go through this code quickly.The load_images() file unzips and decodes images from MNIST’s binary files. This function is specific to MNIST’s binary format, so we do not need to understand its details. If we’re curious to learn them, we should know that struct.unpack() reads data from a binary file according to a pattern string. In this case, the pattern is '>IIII', which means four unsigned integers encoded with the most significant byte first. The code’s comments should help us understand the rest of this function.
The load_images() returns a matrix that’s either (60000,784) in the case of the training images or (10000,784) in the case of the test images. Those matrices can then be passed to the second function, prepend_bias(), to give them an extra column full of 1s for the bias.
And that’s it about the images. Now, we’ll learn about the labels.
Load labels
The code below loads and prepares MNIST’s labels:
load_labels() loads MNIST labels into a NumPy array, and then molds that array into a one-column matrix. Once again, we do not have to understand this code, as we are not likely to load MNIST labels that often—but read the comments if you are curious.
Note: Remember that
reshape(-1, 1)means arranging these data into a matrix with one column and many rows we need.
The function returns a matrix with shape or , depending on whether we load the training labels or the test labels.
The matrix returned by load_labels() contains labels from 0 to 9. We can pass that matrix to encode_fives() to turn those labels into binary values. We do not
need encode_fives() for long, so there is no need to parameterize it. We can hard-code it to encode the digit 5.
The one line in encode_fives() is a typical NumPy code—it’s short and efficient, but it might be a bit difficult to understand at first glance. To clarify, (Y == 5) means an array that contains True where Y contains a 5 and False where it does not. That array is then converted to an array of integers so that all True values become 1, and False values become 0. The end result is a new matrix with the same shape as Y that contains 1 where Y contains a 5 and 0 elsewhere.
After those functions, the final lines in the code define two constants. We can use them to access the training labels and the test labels, respectively.
With that, our MNIST library is complete. Let’s save it as a file (mnist.py), and use it to feed our ML program.
Display the numbers
To display sample digits, we’ll use load_images() function from mnist.py. We can change the variable DIGIT in main.py to get an idea of the MNIST dataset.