Search⌘ K
AI Features

Two-dimensional Convolutions

Explore the mechanics of two-dimensional convolutions used in computer vision applications. Understand different convolution types including default, padding, strides, and their effects on output size. This lesson helps you grasp foundational convolution operations crucial for building convolutional neural networks using JAX.

Note: The animations to explain convolution mechanics are used here with special thanks from Vincent Dumoulin and Francesco Visin’s “A guide to convolution arithmetic for deep learning” [arXiv:1603.07285]

We restricted the last lesson to one-dimensional convolution. In computer vision applications, however, we need to operate in more than one dimension. In this and subsequent lessons, we’ll up the ante by upgrading to two-dimensional convolutions.

Introduction

We can extend the convolution to two-dimensions:

(fg)[m,n]=i=j=f[i,j]g[mi,nj](f * g)[m,n] = \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j] g[m - i,n-j]

Since two-dimensional convolution is used frequently in computer vision applications, we’ll invest more time explaining its mechanics.

Preliminaries

Throughout the examples, we will assume the following settings:

  • The input image I (shown in blue) has the dimensions m×nm\times n.
  • The convolving kernel/filter, F having dimensions f×ff\times f (square kernels are the usual standard).
  • The output image O (shown in green) has the dimensions x×yx\times y.

Implementation

We used Scipy’s convolve() as an N-dimensional convolution choice in the last lesson. We’ll go with a more solid foundation here.

JAX and its various neural network libraries provide a number of different convolution functions. Behind all those functions including Scipy’s) is the fundamental implementation of jax.lax.conv_general_dilated().

This function takes four (necessary) parameters:

  • Input matrix
  • Output matrix
  • Stride - use (1,1) by default
  • Padding - use [(0,0),(0,0)] by default

Note: Usually, 2D convolution requires a 4D volume due to channels and batch size, but we’ll keep it simple here by using single 2D matrices for II,OO, and FF.

Types of convolution

There are a few varieties of convolution, depending on whether or not we’re using a stride or padding. We’ll quickly review them.

Default

In default mode, we convolve the filter/kernel over the input. The resulting image inevitably shrinks in size

x=mf+1 x = m-f+1

and similarly,

...

Default mode of convolution