Search⌘ K
AI Features

Model Optimization and Quantization

Explore the principles of model optimization and quantization in TensorFlow Lite to reduce model size and improve inference speed on resource-constrained devices. Understand dynamic range, full integer, and float16 quantization options, their implementation, and trade-offs, enabling you to deploy more efficient deep learning models on mobile and embedded hardware.

The TF Lite converter generates lightweight TF models suitable for resource-constrained mobile and edge devices. We can make the TF Lite models even more compact and fast by applying some optimization and quantization techniques at the cost of a little reduction in model performance. Let’s discuss the process of quantization and model optimization techniques offered by the TF Lite framework.

Quantization

Quantization is a procedure to map input values represented in a larger set to the output values in a relatively smaller set. The range of input values can be infinite (continuous) or finite (using a large number of bits to store numbers). The following figure shows the quantization of a continuous information source.

Quantizing the information (red) by representing it in eight levels (blue)
Quantizing the information (red) by representing it in eight levels (blue)

Here, we use three bits to represent each quantized value (blue) and a total of 23=82^3=8 ...