Skip to content

forge.onnx.inference

forge.onnx.inference.get_inference_function

get_inference_function(model: ModelProto | str | bytes | PathLike, providers: str | None = None, opt_level: str | None = None, **kwargs) -> Callable

Creates a LEIP Runtime Engine inference function from the given model.

This function loads an ONNX model and returns a callable inference function that can be used to run predictions. The returned function automatically handles input and output names and shapes, and executes inference using the specified execution provider and optimization level.

Parameters:

Name Type Description Default
model Union[ModelProto, str, bytes, PathLike]

The ONNX model to be loaded. Can be a ModelProto object, a path to the model file, a serialized model in bytes, or an os.PathLike object representing a file path.

required
providers Optional[str]

The execution provider to use for inference. Can be "cpu", "cuda", or "tensorrt". If not provided, defaults to the best available provider.

None
opt_level Optional[str]

The level of graph optimization to apply during model loading ["disable", "basic", "extended", "all"]

None
**kwargs

Additional keyword arguments to pass to ONNXOptions.

{}

Returns:

Name Type Description
Callable Callable

A callable function that takes DLPack-compatible tensor(s) as input and returns a dictionary mapping output names to their corresponding DLPack-compatible tensors. The function has additional metadata attributes such as input_names, input_shapes, output_names, output_shapes, and session.

Example
inference_fn = get_inference_function("model.onnx", providers="cuda")
output = inference_fn(input_tensor)
output_name = inference_fn.output_names[0]
print(output[output_name])

Raises:

Type Description
TypeError

If the model type is not recognized.

forge.onnx.inference.get_inference_session

get_inference_session(model: ModelProto | str | bytes | PathLike, providers: str | None = None, opt_level: str | None = None, **kwargs) -> LatentRuntimeEngine

Creates a LEIP Runtime Engine inference session.

This helper function initializes and returns a LEIP Runtime Engine with the specified model, execution provider, and optimization level. Additional options can be set via keyword arguments.

Parameters:

Name Type Description Default
model Union[ModelProto, str, bytes, PathLike]

The ONNX model to load. Can be a ModelProto object, a path to the model file, a serialized model in bytes, or an os.PathLike object representing a file path.

required
providers Optional[str]

The execution provider to use. Can be "CPU", "CUDA", or "TRT". If not provided, defaults to "CPU".

None
opt_level Optional[str]

The level of graph optimization to apply. Refer to LEIP Runtime documentation for valid optimization levels.

None
**kwargs

Additional keyword arguments to pass to ONNXOptions.

{}

Returns:

Name Type Description
LatentRuntimeEngine LatentRuntimeEngine

The initialized LEIP Runtime Engine that can be used for running inference on the provided model.

Example
session = get_inference_session("model.onnx", providers="CUDA")
session.infer(input_tensor)
outputs = session.get_outputs()

Raises:

Type Description
TypeError

If the model type is not recognized.