Understanding Deep Learning Applications in Rare Event Prediction/

...

Architecture of Convolutional Networks

Explore the structure of convolutional networks, from grid-like inputs to the dense layer output.

We'll cover the following...

Structure
- Conv1D, Conv2D, and Conv3D
Convolution layer output size
Pooling layer output size
Parameters

Press + to interact

The components of the network are as follows:

Grid-like input: Convolutional layers take grid-like inputs. The input in the illustration is like an image, that is, it has two axes and three channels each for $\color{blue}{\text{blue}}$ , $\color{red}{\text{red}}$ , and $\color{green}{\text{green}}$ .
Convolutional layer: A layer comprises filters. A filter is made up of kernels. It has one kernel for each input channel. The size of a layer is essentially the number of filters in it which is a network configuration. Here, five illustrative filters, such as diagonal stripes, horizontal stripes, diamond grid, shingles, and waves, are shown in the convolutional layer. Each of them has $\color{blue}{\text{blue}}$ , $\color{red}{\text{red}}$ , and $\color{green}{\text{green}}$ channels to match the input.
Convolutional output: A filter sweeps the input to tell the presence/absence of a pattern and its location in the input. The outputs corresponding to each filter are shown with a black square of the same pattern in the figure. It must be noted that the colored channels in the input are absent in the layer’s output. This is because the information across the channels is aggregated during the convolution operation. Consequently, the original channels in the input are relinquished. Instead, the output of each filter becomes a channel for the next layer.
Pooling layer: A convolutional layer is conjoined with a pooling layer. The pooling layer summarizes the spatial features, which are the horizontal and vertical axes in the illustration.
Pooling output: Pooling reduces the sizes of the spatial axes due to a data summarization along the axes. This makes the network invariant to minor translations and robust to noisy inputs. It’s important to note that the pooling occurs only along the spatial axes. Therefore, the number of channels remains intact.
Flatten: The feature map so far is still in a grid-like structure. The flatten operation vectorizes the grid feature map. This is necessary before passing the feature map onto a dense output layer.
Dense (output) layer: Ultimately, a dense layer maps the convolution-derived features with the response.

A convolutional network’s purpose is to automatically learn predictive filters from data. Multiple layers are often stacked to learn from low- to high-level features. For instance, a face recognition network could learn the edges of a face in the lower layers and the shape of eyes in the higher layers.

Note: The purpose of a convolutional network is to learn the filters automatically.

`Conv1D`, `Conv2D`, and `Conv3D`

In Tensorflow, the convolutional layer can be chosen from Conv1D, Conv2D, and Conv3D. The three types of layers are designed for inputs with one, two, or three spatial axes, respectively. Let’s look at when each of them is applicable and their interchangeability.

Convolutional networks work with grid-like inputs. Such inputs are categorized based on their axes and channels. The table below summarizes them for a few grid-like data, such as time series, image, and video.

Axes and Channels in Grid-Like Inputs to Convolutional Networks

	Time Series	Image	Video
Axis-1 (Spatial dim1)	Time	Height	Height
Axis-2 (Spatial dim2)	-	Width	Width
Axis-3 (Spatial dim3)	-	-	Time
Channels	Features (one in univariate time series)	Colors	Colors
Conv’x’d	`Conv1D`	`Conv2D`	`Conv3D`
Input Shape	(samples, time, features)	(samples, height, width, colors)	(samples, height, width, time, colors)
`kernel_size³`	An integer, t, specifying the time window	An integer tuple (h, w) specifies the height and width window.	An integer tuple, (h, w, t), specifying the height, width, and time window.
Kernel Shape	(t, features)	(h, w, colors)	(h, w, t, colors)

A univariate time series has a single spatial axis corresponding to time. If it is multivariate, then the features make the channels. Irrespective of the number of channels, a time series is modeled with Conv1D as it has only one spatial dimension.

Images, on the other hand, have two spatial axes along their height and width. Videos have an additional spatial axis oxymoronically along time. Conv2D and Conv3D are, therefore, applicable to them, respectively. The channels in them are the palette colors such as $\color{red}{\text{red}}$ , $\color{green}{\text{green}}$ , and $\color{blue}{\text{blue}}$ .

Note: Conv1D, Conv2D, and Conv3D are used to model inputs with one, two, and three spatial axes, respectively.

The Conv‘x’D selection is independent of the channels. There could be any number of channels of arbitrary features. The Conv‘x’D is chosen based on the number of spatial axes only.

Inputs to Conv1D, Conv2D, and Conv3D are structured as N-D tensors of the following shapes, respectively:

(samples, time_steps, features)
(samples, height, width, channels)
(samples, height, width, channels)

The first axis is reserved for samples for almost every layer in TensorFlow.

Press + to interact

Among them, the last axis corresponds to the channels (by default) in any of the Conv‘x’D layersLayercon and the rest are the spatial axes.

The kernel_size argument in Conv‘x’D determines the spatial dimension of the convolution kernel. The argument is a tuple of integers. Each element of the tuple corresponds to the kernel’s size along the respective spatial dimension. The depth of the kernel is fixed and equal to the number of channels. The depth is, therefore, not included in the argument.

Note: Conv layers are agnostic to the number of channels. They differ only by the shape of the input’s spatial axes.

Besides, we might observe that a Conv2D can be used to model the inputs of Conv1D by appropriately reshaping the samples. For example, a time series can be reshaped as ...

Getting Started

Rare Event Prediction

Multi-Layer Perceptrons (MLPs)

Long Short-Term Memory (LSTM) Networks

Convolutional Neural Networks (CNNs)

Autoencoders

Conclusion

Architecture of Convolutional Networks

Structure

`Conv1D`, `Conv2D`, and `Conv3D`

Axes and Channels in Grid-Like Inputs to Convolutional Networks

Architecture of Convolutional Networks

Structure

Conv1D, Conv2D, and Conv3D

Axes and Channels in Grid-Like Inputs to Convolutional Networks

`Conv1D`, `Conv2D`, and `Conv3D`