The Building Blocks of CNNs

Learn how a computer applies a convolutional neural network to an image.

In the next lessons, we’ll go through the core components of convolutional neural networks. To be clear, we should not expect to build a CNN after reading through these lessons. However, we’ll get an idea of how they work, enough of it to run a convolutional network on CIFAR-10.

Let’s start with the most important difference between how a fully connected neural network and a convolutional neural network look at data.

An image is an image

The first time we built a network to process images (in Getting Real), we noted a potentially surprising detail that the neural network does not know that the examples are images. Instead, it sees them as flat sequences of bytes. We even flattened the images right after loading them, and we still did that in the last lesson:

(X_train_raw, Y_train_raw), (X_test_raw, Y_test_raw) = cifar10.load_data()
X_train = X_train_raw.reshape(X_train_raw.shape[0], -1) / 255

Besides rescaling the input data, this code flattens each example to a row of bytes. CIFAR-10 contains 60,000 training images, each 32 by 32 pixels per color channel, for a total of 3,072 pixels. This code reshapes the training data into a (60000,3072)(60000, 3072) matrix. Our network flattens the test and the validation set in the same way.

By contrast, here is the equivalent code for a CNN:

(X_train_raw, Y_train_raw), (X_test_raw, Y_test_raw) = cifar10.load_data()
X_train = X_train_raw / 255

This code still loads and rescales data, but it does not flatten the images. The CNN knows that it’s dealing with images and preserves their shape in a (60000, 32, 32, 3) tensor. These are 60,000 images, each 32 by 32 pixels, with three color channels. The test and validation set would also be four-dimensional tensors.

To recap, a fully connected network ignores the geometric information in an image, and a convolutional network keeps that information around. That difference does not stop at the input layer. The hidden convolutional layers in a CNN also take four-dimensional data as their input, and they output four-dimensional data for the following layer.

Speaking of convolutional layers, let’s talk about the operation they’re based on.


In the field of image processing, a convolution is an operation that involves two matrices:

  1. An image
  2. A filter.

Here is an example:

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy