Bottleneck
Learn to implement the bottleneck block in ResNet architectures, understanding how it uses a squeeze-expand approach with three convolution layers to reduce parameters and enable very deep networks. This lesson guides you through coding the third convolution layer and grasping parameter efficiency for training deep ResNet models.
We'll cover the following...
Chapter Goals:
- Learn about the bottleneck block and why it's used
- Implement a function for a bottleneck block
A. Size considerations
The ResNet block we described in the previous chapter is the main building block for models with fewer than 50 layers. Once we hit 50 or more layers, we want to take advantage of the large model depth, and utilize more filters in each convolution layer. However, using more filters results in added weight parameters, which can lead to incredibly long training time.
To counter this, ResNet incorporates the same squeeze and expand concept used by the SqueezeNet fire module. The ResNet blocks for 50+ layer models will now use 3 convolution layers rather than 2, where the first convolution layer squeezes the number of channels in the data and the third convolution layer expands the number of channels. We refer to these blocks as bottleneck blocks.
B. Bottleneck block
The third convolution layer of a bottleneck block uses four times as many filters as a regular ResNet block. This means that the input to bottleneck blocks will have four times as many channels (remember that the output of one block is the input to the next). Hence, the first convolution layer acts as a squeeze layer, to reduce the number of channels back to the regular amount.
Another similarity to the SqueezeNet fire module is the mixed usage of 1x1 and 3x3 kernels. The bottleneck block uses 1x1 kernels for the first and third convolution layers, while the middle convolution layer still uses 3x3 kernels. This helps reduce the number of weight parameters while still maintaining good performance.
C. Parameter comparison
In the SqueezeNet Lab, we introduced the equation to calculate the number of weight parameters in a convolution layer:
where is the kernel dimensions, ...