CNN for Classification
Explore how to build a convolutional neural network to classify handwritten digit images from the MNIST dataset. Understand input and output tensor shapes, convolutional layers, pooling, and how to assemble a CNN architecture. Learn the role of ReLU activation, dropout, and converting logits to probabilities. This lesson provides foundational knowledge to implement and train CNNs for image classification tasks.
We saw in the previous lesson that a multilayer perceptron can be trained to classify tabular data. An image differs from tabular data in the sense that an image contains unstructured information. We cannot go to a predefined pixel to extract a useful feature for the classification. Objects in an image dataset can appear anywhere under various poses. For this reason, we generally cannot treat an image as a vector of length
A better approach is to extract high-level features through a composition of convolution layers and spatial pooling. At some point, the spatial resolution is sufficiently low, and we can flatten the image into a vector. This vector can then be processed by a multilayer perceptron.
Problem statement
In this lesson, our task is to build a CNN with the building blocks we studied in the previous lesson. The task is to classify monochrome images of handwritten digits from the MNIST dataset. Each image has a size of 28x28. The classes are the 10 digits, from zero to nine.
The input and output tensor shapes
The problem statement imposes the shapes of the input tensor and the output tensor.
The input tensor must have a shape