Why convolution?

The fully connected layer that we saw doesn’t respect the spatial structure of the input. If, for example, the input is an image, the NN will destruct the 2D structure into a 1-dimensional vector. To address the issue, we have designed Convolutional Neural Networks (CNNs). They work exceptionally well for computer vision applications.

Why do we use them when we process images? Because we know a priori that nearby pixels share similar characteristics and we want to take that into account by design. That assumption is called the inductive bias.

Convolutional layers exploit the local structure of the data.

Representation of a Convolutional Neural Network

But how is it possible to focus on the local structure instead of fully connected layers that take linear combinations of the input?

The answer is quite simple. We restrict the convolutional layer to operate on a local window called kernel. Then, we slide this window throughout the input image.

Convolution

The basic operation of CNNs is the convolution. Mathematically, a convolution between two 2-dimensional functions is defined as:

(f*g)(i,j) = \sum_{a} \sum_{b} f(a,b)g(i-a,j-b).

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

The Principles of the Convolution

Why convolution?

Convolution