Search⌘ K
AI Features

Our Own MNIST Library

Explore how to convert MNIST images and labels into input matrices for a binary classifier, focusing on recognizing the digit 5. Understand reshaping images into feature vectors, encoding labels for binary output, and developing a small library to load and preprocess this data effectively.

Goal of the lesson

In the previous lessons, we built a binary classifier. Now we want to apply that program to MNIST.

The first step is to reshape MNIST’s images and labels into an input for our program. Let’s see how to do that.

Prepare the input matrices

Our binary classifier program expects its input formatted as two matrices: a set of examples XX and a set of labels YY. Let’s start with the matrix of examples XX.

XX is supposed to have one line per example and one column per input variable, plus a bias column full of 1s. (Remember the bias column? We talked about it in the Bye-bye, bias section)

To fit MNIST’s images to this format, we can reshape each image to a single line of pixels so that each pixel becomes an input variable. MNIST images are 28 by 28 pixels, so squashing them results in lines of 784 elements. And throw in the bias column, and that makes 785. So, X should look like a matrix of 60,000 lines (the number of examples) and 785 columns (one per pixel, plus the bias). Check out the following graph:

We just graduated from simple examples with three or four input variables to tens of thousands of examples and hundreds of input variables!

Now let’s look at YY, the matrix of labels. At first glance, it looks simpler than the matrix of images. It still has one line per example, but only one column, that contains the label. However, we have an additional difficulty here. So far, we have only built binary classifiers, which expect a label that’s either 0 or 1. By contrast, MNIST labels range from 0 to 9. How can we fit ten different values into either 0 or 1?

For now, we can work around that problem by narrowing our scope. Let’s start by recognizing one specific digit, the digit 5. This is a binary classification problem because a digit can belong to two classes: “not a 5” and “5.”

This means that we should convert all MNIST labels to 0s, except for 5s, which we should convert to 1s. So Our YY matrix looks like:

Now we know how to build XX and YY. Let’s turn this plan into code.

Build a library

The Internet overflows with libraries and code snippets that read MNIST data. But we are developers, so hey, let’s write one more!

In this section, we’ll introduce the code for a tiny library that loads those images and labels and reshapes them into the XX and YY that we just described.

Load images

Here is the code that loads MNIST images into two matrices: one for the training examples and one for the test examples:

Python 3.5
def load_images(filename):
# Open and unzip the file of images:
with gzip.open(filename, 'rb') as f:
# Read the header information into a bunch of variables
_ignored, n_images, columns, rows = struct.unpack('>IIII', f.read(16))
# Read all the pixels into a NumPy array of bytes:
all_pixels = np.frombuffer(f.read(), dtype=np.uint8)
# Reshape the pixels into a matrix where each line is an image:
return all_pixels.reshape(n_images, columns * rows)
def prepend_bias(X):
# Insert a column of 1s in the position 0 of X.
# (“axis=1” stands for: “insert a column, not a row”)
return np.insert(X, 0, 1, axis=1)

Let’s go through this code quickly.The load_images() file unzips and decodes images from MNIST’s binary files. This function is specific to MNIST’s binary format, so we do not need to understand its details. If we’re curious to learn them, we should know that struct.unpack() reads data from a binary file according to a pattern string. In this case, the pattern is '>IIII', which means four unsigned integers encoded with the most significant byte first. The code’s comments should help us understand the rest of this function.

The load_images() returns a matrix that’s either (60000,784) in the case of the training images or (10000,784) in the case of the test images. Those matrices can then be passed to the second function, prepend_bias(), to give them an extra column full of 1s for the bias.

And that’s it about the images. Now, we’ll learn about the labels.

Load labels

The code below loads and prepares MNIST’s labels:

Python 3.8
def load_labels(filename):
# Open and unzip the file of images:
with gzip.open(filename, 'rb') as f:
# Skip the header bytes:
f.read(8)
# Read all the labels into a list:
all_labels = f.read()
# Reshape the list of labels into a one-column matrix:
return np.frombuffer(all_labels, dtype=np.uint8).reshape(-1, 1)
def encode_fives(Y):
# Convert all 5s to 1, and everything else to 0
return (Y == 5).astype(int)
mnist.py

load_labels() loads MNIST labels into a NumPy array, and then molds that array into a one-column matrix. Once again, we do not have to understand this code, as we are not likely to load MNIST labels that often—but read the comments if you are curious.

Note: Remember that reshape(-1, 1) means arranging these data into a matrix with one column and many rows we need.

The function returns a matrix with shape (60000,1)(60000, 1) or (10000,1)(10000, 1), depending on whether we load the training labels or the test labels.

The matrix returned by load_labels() contains labels from 0 to 9. We can pass that matrix to encode_fives() to turn those labels into binary values. We do not need encode_fives() for long, so there is no need to parameterize it. We can hard-code it to encode the digit 5.

The one line in encode_fives() is a typical NumPy code—it’s short and efficient, but it might be a bit difficult to understand at first glance. To clarify, (Y == 5) means an array that contains True where Y contains a 5 and False where it does not. That array is then converted to an array of integers so that all True values become 1, and False values become 0. The end result is a new matrix with the same shape as Y that contains 1 where Y contains a 5 and 0 elsewhere.

After those functions, the final lines in the code define two constants. We can use them to access the training labels and the test labels, respectively.

With that, our MNIST library is complete. Let’s save it as a file (mnist.py), and use it to feed our ML program.

Display the numbers

To display sample digits, we’ll use load_images() function from mnist.py. We can change the variable DIGIT in main.py to get an idea of the MNIST dataset.

C++
import mnist
import numpy as np
import matplotlib.pyplot as plt
DIGIT = 5
X = mnist.load_images("/programming-machine-learning/data/mnist/train-images-idx3-ubyte.gz")
Y = mnist.load_labels("/programming-machine-learning/data/mnist/train-labels-idx1-ubyte.gz").flatten()
digits = X[Y == DIGIT]
np.random.shuffle(digits)
rows, columns = 3, 15
fig = plt.figure()
for i in range(rows * columns):
ax = fig.add_subplot(rows, columns, i + 1)
ax.axis('off')
ax.imshow(digits[i].reshape((28, 28)), cmap="Greys")
plt.show()