Search⌘ K
AI Features

InceptionV1—GoogleNet (2014)

Understand the InceptionV1 GoogleNet architecture that won the 2014 LSVRC, featuring its network-in-network approach, multiple classifier heads, and efficient convolution techniques. Learn how this model balances depth and parameter efficiency for improved image classification performance.

General structure

InceptionV1 is the image classification architecture that won the LSVRC competition in 2014.

  • It has a 22-layer architecture that uses the network-in-network approach for some layers that they call Inception modules.

  • It’s training strategies are similar to other architectures. It has an SGD with a momentum of 0.9, fixed learning rate decreasing by 4% every 8 epochs, drop out at the fully connected layers with a rate of 0.4, ReLU activation function in Inception modules, and softmax at the end.

  • Average pooling is applied between the final convolution layer and fully connected ones.

  • Instead of having one fully connected head, they have three. They call two additional fully connected extensions auxiliary classifiers. The exciting part is they use these three classifier heads during training and take the average of the final weights of these different classifier heads to obtain the final and unique head to use alone in inference.

Network-in-network

The main logic of network-in-network layers is to apply the different sizes of convolutions to the same input and concatenate the outcoming feature maps to obtain the final output from 1 layer. This approach provides feature maps with different scales from just the same input and increases the variety of the information coming from the input image. Therefore, it widens the learning capacity of the model with different scales from a given input.

In this logic, any network-in-network can be created with varying filters of convolution. The model calls the special layers using the network-in-network approach as Inception modules. Its structure is as follows:

Inception module: naive version
1 / 2
Inception module: naive version

Auxiliary classifiers

Apart from the main classifier head at the end of the model, they create two extensions to make predictions from different scales and call these additional parts auxiliary classifiers. An auxiliary classifier’s structure is as follows:

  • An average pooling layer with 5×5 filter size and stride 3, resulting in a 4×4×512 output for the first auxiliary extension and 4×4×528 for the second one.

  • A 1×1 convolution with 128 filters for dimension reduction and rectified linear activation (ReLU).

  • A fully connected layer with 1024 units and rectified linear activation function (ReLU) with a dropout having a 70% ratio (40% in the main classifier head.)

  • A fully connected layer with softmax activation function as the classifier, predicting the same 1000 classes as the primary classifier.

Inception architecture
Inception architecture

We see the two auxiliary classifiers and the main one at the top in the above visualization. Also, we see that the architecture starts with basic operations like regular convolution and maximum pooling blocks and then includes mostly the inception blocks. Note that in each box drawn in the above diagram ...