Inference Using the TF Lite Model
Explore how to use TensorFlow Lite interpreters to load models, preprocess input data, run inference, and retrieve outputs in Android apps. Understand setting options like multi-threading and hardware acceleration to optimize model performance on mobile devices.
The process of on-device inference involves running a TF Lite model to make predictions based on unknown input data. TF inference APIs support common mobile and embedded platforms such as Android. TF Lite models run through an interpreter to infer from the input data. The interpreter is optimized for resource-constrained devices. It uses a custom memory allocator that results in low initialization and execution latency. Let’s explore the use of the TF Lite interpreter to perform inference in an Android app.
Main steps
The following figure explains the main steps to perform inference using a TF Lite model interpreter.
Load: First we load a TF Lite model (
.tfliteextension), which contains the model’s execution graph, to memory.Transform: We then transform data to a format acceptable by the TF Lite model. For instance, we have to resize and rescale input data according to the format the model expects. Moreover, we might need to change the data type of the input as per the model requirements. ...