Using PyLRE to Deploy your Exported ONNX Model¶
You have exported an ONNX model with LEIP Optimize, and now you want to deploy it in a target environment. This tutorial provides step-by-step instructions for loading an optimized artifact, creating an LRE instance, and performing inference.
Runtime Setup¶
We need two components to execute a model on your target:
- target-compatible and model-compatible runtime (LRE)
- target-compatible model or model library (optimized output)
import pylre
from pylre import LatentRuntimeEngine as LRE
import numpy as np
Using LEIP Optimize, get your optimized artifact. This tutorial assumes the model is compiled for float32 and for CPU target. For more information, consult the LEIP Optimize tutorial for optimizing an ONNX model.
optimized_artifact_path = "path/to/exported.onnx"
pylre_options = pylre.ONNXOptions(execution_provider="cpu", precision="float32")
lre = LRE(optimized_artifact_path, options=pylre_options)
With this LRE object, we can introspect on the model we have optimized
lre.get_metadata()
Creating a random tensor to do inference¶
As the model expects only one input, we pick the first one to create a random input tensor
shape = lre.input_shapes[0]
type = lre.input_dtypes[0]
input = np.random.random(shape).astype(type)
With this input data tensor, we can run an inference on the model LRE instantiation we created.
output = lre(input)
This output is in a device-independent target. But you may want to convert into a more amenable format for postprocessing. We will use NumPy for this, but depending on your application and hardware usage, you may want to explore other formats.
numpy_output = np.from_dlpack(output[0])
Verifying expected output shape¶
expected_output_shape = lre.output_shapes[0]
assert numpy_output.shape == expected_output_shape