CNN for Classification

Learn to build a classification CNN with the building blocks we introduced.

We saw in the previous lesson that a multilayer perceptron can be trained to classify tabular data. An image differs from tabular data in the sense that an image contains unstructured information. We cannot go to a predefined pixel to extract a useful feature for the classification. Objects in an image dataset can appear anywhere under various poses. For this reason, we generally cannot treat an image as a vector of length H×WH\times W and process it with a multilayer perceptron.

A better approach is to extract high-level features through a composition of convolution layers and spatial pooling. At some point, the spatial resolution is sufficiently low, and we can flatten the image into a vector. This vector can then be processed by a multilayer perceptron.

Problem statement

In this lesson, our task is to build a CNN with the building blocks we studied in the previous lesson. The task is to classify monochrome images of handwritten digits from the MNIST dataset. Each image has a size of 28x28. The classes are the 10 digits, from zero to nine.

Press + to interact
Sample from the MNIST dataset
Sample from the MNIST dataset

The input and output tensor shapes

The problem statement imposes the shapes of the input tensor and the output tensor.

The input tensor must have a shape ...