EfficientNet (2019)

Learn the fundamentals of the EfficientNet image classification architecture with compound scaling.

General structure

EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all the dimensions of a neural network rather than using different scales for different sizes. They claim that a well-prepared architecture should conserve its proportions for each dimension, and we should not break its rankings by changing each of them manually. For this purpose, they present a coefficient called the compound scale factor and use this to decide the final scale factors for each dimension.

The figure below shows the different scales of a convolutional neural network. Widening a network means increasing the number of channels in convolutional layers. Deepening a network means increasing the number of layers in a neural network. Finally, resolution is the size of the input image.

Press + to interact
Depth, width and resolution scales in a neural network vs. compound scaling
Depth, width and resolution scales in a neural network vs. compound scaling

The main logic is to fix the compound scale factor to 1 and apply a grid search for width, depth, and resolution coefficients. The next step is to improve these newly found coefficients and check for the compound scale (Φ) factor to obtain the final coefficients.

Note: Some constraints were determined to limit their fine-tuning in a good fit. According to this, all the scale factors before applying the compound scale should be equal or higher than 1, and the depth factor, square of width, and resolution factor should be around 2.

We can describe the depth, width, and resolution scale factors as follows:

In this architecture, ΦΦ is set to 11 for the base EfficientNet architecture, assigning the scale coefficients as α=1.2α = 1.2 ...