What are MobileNet models?
Overview
Significant contributions in the field of image recognition and classification have been empowered by convolutional neural networks and their improved architecture and increased
However, these advancements do not imply that the techniques can be employed in time efficient settings as there exists a tradeoff between computational cost and efficiency of the model.
MobileNet models take advantage of the property that many of the model parameters in trained models are redundant and capture the same features within the images. Hence, there is significant room for pruning and reducing the number of parameters to produce much more time efficient models that can be used on lightweight, and computationally restrained devices such as mobile phones. This is done precisely via depth-wise-separable filters.
What are depth-wise-separable filters?
Standard convolution techniques attempt to filter the features and then merge them into a new output all in one step.
For
Note: It follows that the computational cost grows multiplicatively with the number of input channels, number of output channels, dimensions of the input feature map and the kernel dimensions.
This computation is articulated into two steps by depth-wise separable filters:
- Depth-wise convolution: For each input channel with depth
, each channel is split so to produce different matrices that are filtered individually and hence parallelized. - Pointwise convolution: Each of the
different filtered representations are stacked and convoluted with a 1 x 1 kernel to merge them into one output representation.
The sequential chaining is illustrated in the diagram below:
The computational cost for depth-wise convolution can be computed using the following equation, where
Like depth-wise convolution, point-wise convolution has a computational cost, shown in the following equation, where
Therefore, the total cost can be expressed as the sum of the costs of depth-wise convolution and point-wise convolution:
Code example
An example of classification for the input image is shown below:
import numpy as npimport kerasfrom keras import backend as Kfrom keras.preprocessing import imagefrom keras.applications import imagenet_utilsdef preprocess_image(input_file_name):image_loaded = keras.utils.load_img(input_file_name, target_size=(224, 224))image_array = keras.utils.img_to_array(image_loaded)image_array_padded = np.expand_dims(image_array, axis=0)return keras.applications.mobilenet.preprocess_input(image_array_padded)mobile = keras.applications.mobilenet.MobileNet()def display_results(arr):arr_pred = arr[0]i = 0while i < len(arr_pred):tuple_pred = arr_pred[i]print(tuple_pred[1],'--->', tuple_pred[2])i = i+1preprocessed_image = preprocess_image('input.jpg')predicted_output = mobile.predict(preprocessed_image)results_decoded = imagenet_utils.decode_predictions(predicted_output)print("The predicted labels along with their respective probabilities are as follows:")display_results(results_decoded)
Code explanation
- Lines 1 to 4: We import the necessary Keras and NumPy modules.
- Lines 7 to 11: We have a function to preprocess the images.
- Line 13: We create an object of class
MobileNet()that is a pre-trained classifier. - Lines 8 to 12: We preprocess the input image.
- Lines 21 to 23: We use the
MobileNet()'s prediction function to render an appropriate prediction for the input image and store the possible image labels along with their relative probabilities in the results.
Free Resources