Optimize a Quantized Model for an Android Target¶

Mobile phones are ubiquitous edge devices with AI capabilities. Many mobile phones run the Android operating system. Quantized models are popular for mobile devices for faster inference and less energy consumption. An accuracy compromise is generally acceptable in these devices to achieve real-time inference, and these devices also commonly have optimized integer execution. This tutorial provides step-by-step instructions for importing a quantized mobile device model from Kaggle into Forge, applying optimizations, and creating an artifact to be deployed.

Environment setup¶

We will be using Kaggle to download a quantized mobile device object detection model.

In [ ]:

Copied!

! pip install kagglehub
! pip install tflite
! pip install kagglehub
! pip install tflite

Download a model and compile it for Android¶

We will download the model from Kaggle and load it with TensorFlow Lite (tflite).

In [ ]:

Copied!

import kagglehub

# Download latest version
path = kagglehub.model_download("iree/ssd-mobilenet-v1/tfLite/100-320-uint8-nms")

print("Path to model files:", path)
import kagglehub

# Download latest version
path = kagglehub.model_download("iree/ssd-mobilenet-v1/tfLite/100-320-uint8-nms")

print("Path to model files:", path)

In [ ]:

Copied!





from pathlib import Path
import tflite

tflite_model_file = Path(path) / "1.tflite"
tflite_model_buf = open(tflite_model_file, "rb").read()
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)
from pathlib import Path
import tflite

tflite_model_file = Path(path) / "1.tflite"
tflite_model_buf = open(tflite_model_file, "rb").read()
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)

Then we will load it into Forge for optimization.

In [ ]:

Copied!

import forge
ir = forge.from_tflite(tflite_model)
import forge
ir = forge.from_tflite(tflite_model)

You can use the forge.IRModule class to introspect into the model. We can see that this model expects uint8 inputs.

In [ ]:

Copied!

ir.input_dtypes
ir.input_dtypes

We can also verify that the model is quantized.

In [ ]:

Copied!

ir.is_quantized
ir.is_quantized

We will set the target and create a directory to save the compiled output.

In [ ]:

Copied!

target =  "android/cpu"
target =  "android/cpu"

In [ ]:

Copied!





import os
optimized_model_dir = "detector_quantized"
if not os.path.exists(optimized_model_dir):
    os.makedirs(optimized_model_dir)
import os
optimized_model_dir = "detector_quantized"
if not os.path.exists(optimized_model_dir):
    os.makedirs(optimized_model_dir)

Now you can compile the model.

In [ ]:

Copied!

ir.compile(target=target, output_path=f"{optimized_model_dir}", force_overwrite=True, export_metadata=True)
ir.compile(target=target, output_path=f"{optimized_model_dir}", force_overwrite=True, export_metadata=True)

The compiled model is all you need to run inference on this model using the Android LRE.

If you're creating your own Android application using our SDK, you would only need this compiled model library.

But we also need some additional information about the model and selection of model specific processing available in Kotlin for deployment using the sample application. Let's package them with our compiled model.

We need the labels associated with the class encoding to make sense of our class detections. We will download the labels our model is trained with and place it in our model directory.

In [ ]:

Copied!





from requests import get
url = 'https://raw.githubusercontent.com/amikelive/coco-labels/refs/heads/master/coco-labels-paper.txt'
filename = f"{optimized_model_dir}/coco-labels-paper.txt"
with open(filename, "wb") as file:
    response = get(url)
    file.write(response.content)
from requests import get
url = 'https://raw.githubusercontent.com/amikelive/coco-labels/refs/heads/master/coco-labels-paper.txt'
filename = f"{optimized_model_dir}/coco-labels-paper.txt"
with open(filename, "wb") as file:
    response = get(url)
    file.write(response.content)

We also need to create a manifest file for the Android SDK that communicates model creation and compile time information to the application.

In [ ]:

Copied!





import json

def create_deploy_manifest(manifest_path, model_name, label_name, preprocessor_path, postprocessor_path, output_ctx=None):  
    data = {
        "object": model_name,
        "labels": label_name,
        "inference": {
            "preprocessor": preprocessor_path,
            "postprocessor": postprocessor_path
        }
    }

    if output_ctx:
        data["inference"]["output_ctx"] = output_ctx

    with open(manifest_path, 'w') as json_file:
        json.dump(data, json_file, indent=4)
import json

def create_deploy_manifest(manifest_path, model_name, label_name, preprocessor_path, postprocessor_path, output_ctx=None):  
    data = {
        "object": model_name,
        "labels": label_name,
        "inference": {
            "preprocessor": preprocessor_path,
            "postprocessor": postprocessor_path
        }
    }

    if output_ctx:
        data["inference"]["output_ctx"] = output_ctx

    with open(manifest_path, 'w') as json_file:
        json.dump(data, json_file, indent=4)

First, we add information of the artifacts we created. model_name is the name of our model library. label_name is the file we created during the label generation. manifest_path is where we want to place the manifest. We call it deploy_manifest.json as the application is looking for that file. We place it in the optimized artifacts directory along with the other files.

In [ ]:

Copied!

manifest_path = f"{optimized_model_dir}/deploy_manifest.json"
model_name = "modelLibrary.so"
label_name = "coco-labels-paper.txt"
manifest_path = f"{optimized_model_dir}/deploy_manifest.json"
model_name = "modelLibrary.so"
label_name = "coco-labels-paper.txt"

Our sample application already provides some data processing functions. This particular model is using the following functions in the application. You can find available functions in the Android sample application on our GitHub page.

In [ ]:

Copied!

preprocessor_path = "io.latentai.android.lre.sample.processor.DetectorPreprocessor"
postprocessor_path = "io.latentai.android.lre.sample.processor.DetectorMnetNMSPostprocessor"

preprocessor_path = "io.latentai.android.lre.sample.processor.DetectorPreprocessor"
postprocessor_path = "io.latentai.android.lre.sample.processor.DetectorMnetNMSPostprocessor"

We keep an additional field to pass a description to of any additional context an Android developer may need from model selection. For this model, we're adding what each of the outputs mean.

In [ ]:

Copied!





output_ctx = """This model has 4 outputs:
output 0 are the selected bounding box coordinates in a normalized scale
output 1 are the label indices
output 2 are the confidence scores of each candidate
output 3 is the number of candidates selected which is pre-set to 10"""
output_ctx = """This model has 4 outputs:
output 0 are the selected bounding box coordinates in a normalized scale
output 1 are the label indices
output 2 are the confidence scores of each candidate
output 3 is the number of candidates selected which is pre-set to 10"""

We use the manifest creation function we just defined to generate the manifest.

In [ ]:

Copied!

create_deploy_manifest(manifest_path, model_name, label_name, preprocessor_path, postprocessor_path, output_ctx=output_ctx)
create_deploy_manifest(manifest_path, model_name, label_name, preprocessor_path, postprocessor_path, output_ctx=output_ctx)

Once compiled and the artifacts are generated, you can take this output to the target device and load it with the Android sample application.