What are MobileNet models?

Overview

Significant contributions in the field of image recognition and classification have been empowered by convolutional neural networks and their improved architecture and increased depthMeasure of the number of layers in a neural network.

However, these advancements do not imply that the techniques can be employed in time efficient settings as there exists a tradeoff between computational cost and efficiency of the model.

MobileNet models take advantage of the property that many of the model parameters in trained models are redundant and capture the same features within the images. Hence, there is significant room for pruning and reducing the number of parameters to produce much more time efficient models that can be used on lightweight, and computationally restrained devices such as mobile phones. This is done precisely via depth-wise-separable filters.

What are depth-wise-separable filters?

Standard convolution techniques attempt to filter the features and then merge them into a new output all in one step.

For $y$ output channels, $i$ input channels, a square-based input feature map of each dimension, $F$ , and square-based kernel of each dimension, K, the number of parameters can be computed as the following:

Note: It follows that the computational cost grows multiplicatively with the number of input channels, number of output channels, dimensions of the input feature map and the kernel dimensions.

This computation is articulated into two steps by depth-wise separable filters:

Depth-wise convolution: For each input channel with depth $y$ , each channel is split so to produce $y$ different matrices that are filtered individually and hence parallelized.
Pointwise convolution: Each of the $y$ different filtered representations are stacked and convoluted with a 1 x 1 kernel to merge them into one output representation.

The sequential chaining is illustrated in the diagram below:

import numpy as np
import keras
from keras import backend as K
from keras.preprocessing import image
from keras.applications import imagenet_utils
def preprocess_image(input_file_name):
    image_loaded = keras.utils.load_img(input_file_name, target_size=(224, 224))
    image_array = keras.utils.img_to_array(image_loaded)
    image_array_padded = np.expand_dims(image_array, axis=0)
    return keras.applications.mobilenet.preprocess_input(image_array_padded)
mobile = keras.applications.mobilenet.MobileNet()
def display_results(arr):
    arr_pred = arr[0]
    i = 0
    while i < len(arr_pred):
        tuple_pred = arr_pred[i]
        print(tuple_pred[1],'--->', tuple_pred[2])
        i = i+1
preprocessed_image = preprocess_image('input.jpg')
predicted_output = mobile.predict(preprocessed_image)
results_decoded = imagenet_utils.decode_predictions(predicted_output)
print("The predicted labels along with their respective probabilities are as follows:")
display_results(results_decoded)

What are MobileNet models?

Overview

What are depth-wise-separable filters?

Code example

Code explanation