In this lesson, we are going to learn about the first step in implementing the Convolutional Neural Network: the convolution operation.

What is a convolution operation?

In mathematical terms, it is a function derived from two functions, which, when integrated, define the change of the shape of one another. Sounds complicated?

That’s okay. We will discuss the step-by-step process which will help you understand how it works. But before that, we need to take a look at the formula:

f(xg)(t)=f(τ)g(tτ)dτf(x * g)(t) = \int_{-\infty}^\infty f(\tau)g(t - \tau)\,\mathrm{d}\tau

Those of you who have practiced any field that entails signal processing are probably familiar with the convolution function.

But don’t worry, we will now look at an example of how this operation actually works.

Let’s say we have an image of a smiley face (just a simple image for better understanding; the same concept applies to complex images too). We will now create a matrix assigning a value of 0 at no color and a value of 1 at black color. See the image below to understand this concept.

Now, we will use the above matrix to perform the convolution operation. Try to create the matrix on your own and check whether it matches the below input image matrix or not.

There are three elements to consider in convolution operation:

  • Input image matrix
  • Feature detector (filter)
  • Feature map

In this example, our input image is the image of the smiley face shown above. Now we will use a feature detector, or filter, to convert the input image to a matrix that will contain the important features of that input image. The resulting image is going to be our feature map. Look at the illustration below.

Convolution Operation
Convolution Operation

Follow these steps to create a feature map:

  • Place the feature detector on the top-left corner of your input image, count the number of matching cells, and then add this count to the top-left corner of the feature map matrix.
  • Repeat this step by shifting the feature detector to the right by one pixel. This shifting is called a stride and since it is shifting by one pixel, it is known as the stride of one pixel. You can have a stride of more than one pixel, but it can leave out some important features in your image.
  • For this example, we got a value of 0 in the first cell because there is no match in any of the cells of the filter and the input image.
  • After you have gone through the whole first row, you can move over to the next row and go through the same process.

Now go through the complete input image and check your output, whether it matches the correct one or not.

By the way, just like a feature detector can be referred to as a kernel or a filter, a feature map is also known as an activation map, and both terms are also interchangeable.

The need for the convolution operation

The main reason is to reduce the size of the input image. Also, the larger your strides are (the movements across pixels), the smaller your feature map is. This is due to the fact that strides are the movement of the filter over the image, and if we take large strides, the filter will skip many portions of the image and generate smaller feature maps. When dealing with proper images, you will find it necessary to widen your strides. Here, we were dealing with a 7 x 7 input image, but real images tend to be substantially larger and more complex.

How do CNNs actually perform convolution?

The example we gave above is a very simplified one. In reality, Convolutional Neural Networks develop multiple feature detectors and use them to develop several feature maps, which are referred to as convolutional layers (see the figure below).

Multiple feature maps are stacked together to create a convolutional layer
Multiple feature maps are stacked together to create a convolutional layer

So, now that you have an adequate understanding of the convolution operation, there are some other useful applications of this operation such as:

  • Sharpening an image
  • Blurring an image
  • Detecting edges in an image