forge.onnx.inference¶
forge.onnx.inference.get_inference_function
¶
get_inference_function(model: ModelProto | str | bytes | PathLike, providers: str | None = None, opt_level: str | None = None, **kwargs) -> Callable
Creates a LEIP Runtime Engine inference function from the given model.
This function loads an ONNX model and returns a callable inference function that can be used to run predictions. The returned function automatically handles input and output names and shapes, and executes inference using the specified execution provider and optimization level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Union[ModelProto, str, bytes, PathLike]
|
The ONNX model to be loaded. Can be a |
required |
providers
|
Optional[str]
|
The execution provider to use for inference. Can be "cpu", "cuda", or "tensorrt". If not provided, defaults to the best available provider. |
None
|
opt_level
|
Optional[str]
|
The level of graph optimization to apply during model loading ["disable", "basic", "extended", "all"] |
None
|
**kwargs
|
Additional keyword arguments to pass to ONNXOptions. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Callable |
Callable
|
A callable function that takes DLPack-compatible tensor(s) as input and returns
a dictionary mapping output names to their corresponding DLPack-compatible tensors.
The function has additional metadata attributes such as |
Example
inference_fn = get_inference_function("model.onnx", providers="cuda")
output = inference_fn(input_tensor)
output_name = inference_fn.output_names[0]
print(output[output_name])
Raises:
| Type | Description |
|---|---|
TypeError
|
If the model type is not recognized. |
forge.onnx.inference.get_inference_session
¶
get_inference_session(model: ModelProto | str | bytes | PathLike, providers: str | None = None, opt_level: str | None = None, **kwargs) -> LatentRuntimeEngine
Creates a LEIP Runtime Engine inference session.
This helper function initializes and returns a LEIP Runtime Engine with the specified model, execution provider, and optimization level. Additional options can be set via keyword arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Union[ModelProto, str, bytes, PathLike]
|
The ONNX model to load. Can be a |
required |
providers
|
Optional[str]
|
The execution provider to use. Can be "CPU", "CUDA", or "TRT". If not provided, defaults to "CPU". |
None
|
opt_level
|
Optional[str]
|
The level of graph optimization to apply. Refer to LEIP Runtime documentation for valid optimization levels. |
None
|
**kwargs
|
Additional keyword arguments to pass to ONNXOptions. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
LatentRuntimeEngine |
LatentRuntimeEngine
|
The initialized LEIP Runtime Engine that can be used for running inference on the provided model. |
Example
session = get_inference_session("model.onnx", providers="CUDA")
session.infer(input_tensor)
outputs = session.get_outputs()
Raises:
| Type | Description |
|---|---|
TypeError
|
If the model type is not recognized. |