Search⌘ K
AI Features

CNN for Classification

Explore how to build a convolutional neural network to classify handwritten digit images from the MNIST dataset. Understand input and output tensor shapes, convolutional layers, pooling, and how to assemble a CNN architecture. Learn the role of ReLU activation, dropout, and converting logits to probabilities. This lesson provides foundational knowledge to implement and train CNNs for image classification tasks.

We saw in the previous lesson that a multilayer perceptron can be trained to classify tabular data. An image differs from tabular data in the sense that an image contains unstructured information. We cannot go to a predefined pixel to extract a useful feature for the classification. Objects in an image dataset can appear anywhere under various poses. For this reason, we generally cannot treat an image as a vector of length H×WH\times W and process it with a multilayer perceptron.

A better approach is to extract high-level features through a composition of convolution layers and spatial pooling. At some point, the spatial resolution is sufficiently low, and we can flatten the image into a vector. This vector can then be processed by a multilayer perceptron.

Problem statement

In this lesson, our task is to build a CNN with the building blocks we studied in the previous lesson. The task is to classify monochrome images of handwritten digits from the MNIST dataset. Each image has a size of 28x28. The classes are the 10 digits, from zero to nine.

Sample from the MNIST dataset
Sample from the MNIST dataset

The input and output tensor shapes

The problem statement imposes the shapes of the input tensor and the output tensor.

The input tensor must have a shape (N,Cin,H,W)=(N,1,28,28)(N, C_{in}, H, W) = (N, 1, 28, 28) ...