Convolution in Practice

Find out why convolutional and pooling layers are the building blocks of Convolutional Neural Networks.

We'll cover the following

When it comes to real-life applications, most images are in fact a 3D tensor with width, height, and 3 channels (R,G,B) as dimensions.

In that case, the kernel should also be a 3D tensor (k×k×channelsk \times k \times channels). Each kernel will produce a 2D feature map. Remember the sliding happens only across width and height. We just take the dot product of all the input channels on the computation. Each kernel will produce 1 output channel.

In practice, we tend to use more than 1 kernel in order to capture different kinds of features at the same time.

A convolutional layer
A convolutional layer

As you may have guessed, our learnable weights are now the values of our filters and can be trained with backpropagation, as usual. We can add a bias into each filter as well.

Convolutional layers can be stacked on top of others. Since convolutions are linear operators, we include non-linear activation functions in between just as we did in fully connected layers.

To recap, you have to think in terms of input channels, output channels, and kernel size. And that is exactly how we are going to define it in Pytorch.

To define a convolutional network in Pytorch, we have:

Get hands-on with 1200+ tech skills courses.