What are MobileNet models?

Overview

Significant contributions in the field of image recognition and classification have been empowered by convolutional neural networks and their improved architecture and increased depthMeasure of the number of layers in a neural network.

However, these advancements do not imply that the techniques can be employed in time efficient settings as there exists a tradeoff between computational cost and efficiency of the model.

MobileNet models take advantage of the property that many of the model parameters in trained models are redundant and capture the same features within the images. Hence, there is significant room for pruning and reducing the number of parameters to produce much more time efficient models that can be used on lightweight, and computationally restrained devices such as mobile phones. This is done precisely via depth-wise-separable filters.

What are depth-wise-separable filters?

Standard convolution techniques attempt to filter the features and then merge them into a new output all in one step.

For yy output channels, ii input channels, a square-based input feature map of each dimension, FF, and square-based kernel of each dimension, K, the number of parameters can be computed as the following:

Note: It follows that the computational cost grows multiplicatively with the number of input channels, number of output channels, dimensions of the input feature map and the kernel dimensions.

This computation is articulated into two steps by depth-wise separable filters:

  1. Depth-wise convolution: For each input channel with depth yy, each channel is split so to produce yy different matrices that are filtered individually and hence parallelized.
  2. Pointwise convolution: Each of the yy different filtered representations are stacked and convoluted with a 1 x 1 kernel to merge them into one output representation.

The sequential chaining is illustrated in the diagram below:

1 of 4

The computational cost for depth-wise convolution can be computed using the following equation, where KK refers to the dimensions of a square-based kernel, ii refers to the number of input channels and FF refers to the dimensions of the input feature map:

Like depth-wise convolution, point-wise convolution has a computational cost, shown in the following equation, where ii refers to the number of input channels, yy refers to the number of output channels and FF refers to the dimensions of the square-based input feature map.

Therefore, the total cost can be expressed as the sum of the costs of depth-wise convolution and point-wise convolution:

Code example

An example of classification for the input image is shown below:

import numpy as np
import keras
from keras import backend as K
from keras.preprocessing import image
from keras.applications import imagenet_utils
def preprocess_image(input_file_name):
image_loaded = keras.utils.load_img(input_file_name, target_size=(224, 224))
image_array = keras.utils.img_to_array(image_loaded)
image_array_padded = np.expand_dims(image_array, axis=0)
return keras.applications.mobilenet.preprocess_input(image_array_padded)
mobile = keras.applications.mobilenet.MobileNet()
def display_results(arr):
arr_pred = arr[0]
i = 0
while i < len(arr_pred):
tuple_pred = arr_pred[i]
print(tuple_pred[1],'--->', tuple_pred[2])
i = i+1
preprocessed_image = preprocess_image('input.jpg')
predicted_output = mobile.predict(preprocessed_image)
results_decoded = imagenet_utils.decode_predictions(predicted_output)
print("The predicted labels along with their respective probabilities are as follows:")
display_results(results_decoded)

Code explanation

  • Lines 1 to 4: We import the necessary Keras and NumPy modules.
  • Lines 7 to 11: We have a function to preprocess the images.
  • Line 13: We create an object of class MobileNet() that is a pre-trained classifier.
  • Lines 8 to 12: We preprocess the input image.
  • Lines 21 to 23: We use the MobileNet()'s prediction function to render an appropriate prediction for the input image and store the possible image labels along with their relative probabilities in the results.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved