Layers in CNN

A typical CNN consists of the following layers: convolutional, pooling, activation, batch normalization, etc.

Convolutional (conv) layer

This is one of the most important layers and a building block of CNNs, responsible for most feature extraction. By applying various filters to the input image, the convolutional layer extracts different patterns, such as edges or textures. The output of this layer is known as a feature map.

What are kernels/filters/feature detectors?

In simple terms, the kernel considers a small area to extract features from it. So, the output feature only sees input features from a small area. Each kernel is a small matrix of weights, usually a set of learnable parameters that are used to extract features in the CNN.

  • The kernel extracts features by moving across the entire image as a sliding window, performing a simple dot product operation between the weights of the kernel and the corresponding values in the input data.

  • These filters have small dimensions (with respect to the width and height of the image), but they have the same depth as the input volume.

    • Depth = the number of channels in an image (for example, grayscale images have 1 channel).

    • Depth (for deeper networks) = the number of filters used in the previous layers for convolution.

How are the weights of the kernel updated?

The kernel is initialized with random values, and during the training process, these values are updated through backpropagation, which updates the values of the kernels by computing the gradient of the loss function with respect to the kernel values.

Get hands-on with 1200+ tech skills courses.