Search⌘ K
AI Features

CNN Building Blocks

Explore the building blocks of convolutional neural networks, including convolution layers, dropout 2D, batch normalization, and max pooling. Understand how these components work together to extract features, reduce dimensionality, and improve training effectiveness for image classification tasks.

Convolution layer

A neural network wouldn’t be named convolutional if it didn’t have at least one convolution layer. A two-dimensional convolution layer scans the input tensor with a small two-dimensional tensor (the convolution kernel) that has the same number of channels as the input tensor. For each pixel in the input tensor, a scalar product gets computed between the neighborhood around the central pixel and the convolution kernel. The result is a tensor that has a high activation where the input tensor looks like the convolution kernel.

To illustrate the functioning of a convolution layer, let’s design one by hand.

Consider the following image:

The bottom of a Raspberry Pi board
The bottom of a Raspberry Pi board

Let’s assume we need to highlight the areas where a green surface touches a yellow surface through a vertical boundary. To do that, we’ll convolve the image with the following kernel:

The 5x5 convolution kernel
The 5x5 convolution kernel

We can manually design the convolution kernel by accessing the weight and bias fields of a torch.nn.Conv2d object:

C++
import torch
import cv2
weight = torch.zeros(1, 3, 5, 5) # (N_kernels, N_channels, W, H)
bias = torch.zeros(1)
weight[0, 0, :, 0: 2] = 90 # The green band on the left
weight[0, 1, :, 0: 2] = 120
weight[0, 2, :, 0: 2] = 35
weight[0, 0, :, 2:] = 75 # The yellow band on the right
weight[0, 1, :, 2:] = 160
weight[0, 2, :, 2:] = 160
cv2.imwrite("./output/0_kernel_weight.png", cv2.resize(torch.moveaxis(weight.squeeze(0), 0, 2).int().numpy(), dsize=(300, 300), interpolation=cv2.INTER_NEAREST))
# Standardize the weight
weight = (weight - torch.mean(weight))/torch.std(weight)
bias[0] = 0.

In lines 6–11, we manually set the BGR values of the weight tensor. In line 16, the bias value is set to 0.

We can now perform a convolution of the image with a kernel that will use this weight and bias:

C++
# ... continued
# Create a 2D convolution layer with the designed weight and bias
conv = torch.nn.Conv2d(3, 1, kernel_size=(3, 3), padding='same')
# The tensors must be wrapped in torch.nn.Parameter objects
conv.weight = torch.nn.Parameter(weight)
conv.bias = torch.nn.Parameter(bias)
# Load an image
original_img = cv2.imread("./images/electronics/rpi_back.jpg")
cv2.imwrite("./output/1_original.png", original_img)
# Convert the image to a tensor
input_tsr = torch.from_numpy(original_img).float()/255.0 # (H, W, C)
input_tsr = torch.moveaxis(input_tsr, 2, 0) # (C, H, W)
# Compute the convolution on the input tensor
convolution_tsr = conv(input_tsr)
# Save the convolution image
convolution_img = convolution_tsr.squeeze(0).detach().numpy() # (H, W)
cv2.imwrite("./output/2_convolution.png", 127 + 10 * convolution_img)

In lines 5 and 6, we set the weight ...