EfficientNet (2019)
Explore the EfficientNet convolutional neural network architecture that uniformly scales network dimensions using a compound coefficient. Understand how width, depth, and resolution factors are fine-tuned through grid and random search to optimize performance. Compare EfficientNet versions and see how this approach improves accuracy while reducing model size and computation compared to other architectures.
We'll cover the following...
General structure
EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all the dimensions of a neural network rather than using different scales for different sizes. They claim that a well-prepared architecture should conserve its proportions for each dimension, and we should not break its rankings by changing each of them manually. For this purpose, they present a coefficient called the compound scale factor and use this to decide the final scale factors for each dimension.
The figure below shows the different scales of a convolutional neural network. Widening a network means increasing the number of channels in convolutional layers. Deepening a network means increasing the number of layers in a neural network. Finally, resolution is the size of the input image.
The main logic is to fix the compound scale factor to 1 and apply a grid search for width, depth, and resolution coefficients. The next step is to improve these newly found coefficients and check for the compound scale (Φ) factor to obtain the final coefficients.
Note: Some constraints were determined to limit their fine-tuning in a good fit. According to this, all the scale factors before applying the compound scale should be equal or higher than 1, and the depth factor, square of width, and resolution factor should be around 2.
We can describe the depth, width, and resolution scale factors as follows:
In this architecture, is set to for the base EfficientNet architecture, assigning the scale coefficients as ...