IRModule Class¶

forge.IRModule¶

`forge.IRModule(mod: Relay.IRModule.obj, params: Optional[Dict[str, Union[Relay.NDArray.obj, np.ndarray]]] = None, inline_partitions: bool = False, fold_constants: bool = True, fold_batch_norms: bool = True)` ¶

IRModule in Forge, an extension of TVM's Relay.IRModule, facilitates advanced manipulation, optimization, and compilation of machine learning models.

This class provides a user-friendly, graph-based interface for the Relay Intermediate Representation (IR), making it easier to work with compared to the standard TVM IRModule. It's designed to accommodate both beginners and expert users in machine learning, offering tools for model calibration, optimization, quantization, and the entry-point for direct graph manipulation.

Parameters:

Name	Type	Description	Default
`mod`	`obj`	A Relay IRModule, i.e. TVM-IRModule	required
`params`	`Dict[str, Union[obj, ndarray]]`	The weights of the IRModule, defaults to None.	`None`
`inline_partitions`	`bool`	Inlines all partitions during initialization if enabled, defaults to False.	`False`
`fold_constants`	`bool`	Folds all constant branches during initialization if enabled, defaults to True.	`True`
`fold_batch_norms`	`bool`	Folds all `nn.batch_norm` operators during initialization if enabled, defaults to True.	`True`

`graphs: Dict[str, Graph] = {gv.name_hint: Graph(relay_expr=func_expr, params=params, sink_name=gv.name_hint)for (gv, func_expr) in mod.functions.items()}` `class-attribute` `instance-attribute` ¶

`fingerprint: str` `property` ¶

Deterministic hashing of a IRModule's structure and data

To get a structural hash (excluding data) use the hash() function.

`params: Dict[str, Relay.NDArray.obj]` `property` ¶

The weights of the IRModule that are not "frozen" into the graph

`input_count: int` `property` ¶

IRModule's number of expected inputs

`input_shapes: List[Sequence[int]]` `property` ¶

List of IRModule's input shapes

`input_dtypes: List[str]` `property` ¶

List of IRModule's input data types

`input_nodes: List[Node]` `property` ¶

The IRModule's computational graph's input Node objects

`output_count: int` `property` ¶

IRModule's number of expected ouputs

`output_shapes: List[Sequence[int]]` `property` ¶

List of IRModule's output shapes

`output_dtypes: List[Sequence[int]]` `property` ¶

List of IRModule's output data types

`output_node: Node` `property` ¶

The IRModule's computational graph's output Node object

`graph_count: int` `property` ¶

Number of computational graphs in the IRModule, includes the "main" graph

Note: This is a strictly positive number.

`subgraph_count: int` `property` ¶

Number of computational subgraphs in the IRModule, excludes the "main" graph

Note: This is a strictly non-negative number.

`operators: Dict[str, int]` `property` ¶

Full count of the IRModule's operators

`mod: Relay.IRModule.obj` `property` ¶

Relay IRModule (i.e. TVM-IRModule) without type-annotations

`typed_mod: Relay.IRModule.obj` `property` ¶

Relay IRModule (i.e. TVM-IRModule) with type-annotations

Note: This property is a quick means of validating for "correctness" in a well-formed Relay.IRModule. This will throw a TVMError for an invalid IRModule.

`main: Graph` `property` ¶

The IRModule's "main" computational Graph object

`is_tensorrt: bool` `property` ¶

Flag to check if IRModule is partitioned for TensorRT or not

`is_calibrated: bool` `property` ¶

Flag to check if IRModule is calibrated or not

`is_quantized: bool` `property` ¶

Flag to check if IRModule is quantized or not

`is_split_tensors: bool` `property` ¶

Flag to check if IRModule has been tensor-split or not

`copy() -> IRModule` ¶

Get a deepcopy of the IRModule.

Copying an IRModule can be very useful and helpful for getting a "checkpoint" of the IRModule. This can be especially useful before any in place transformations like quantization or partitioning.

Returns:

Type	Description
`IRModule`	A duplicate copy of the IRModule

`get_inference_function(target: str = 'llvm') -> Callable` ¶

Get a Python callable that emulates the model's inference

This can be useful for debugging purposes and validating accuracy/correctness. The returned callable is not an optimized compilation and should not to be measured for optimized latency. A user of the returned calalble should provide numpy arrays as inputs, and can expect a NumPy array or a list of numpy arrays as output.

Parameters:

Name	Type	Description	Default
`target`	`str`	A string that corresponds to the desired device target. A user typically should not need to explcitly set this (unless they really wish to run on GPU, i.e. `target="cuda"`). Please see the docstrings for IRModule.compile()` for more details on target strings.	`'llvm'`

Returns:

Type	Description
`Callable`	Inference function

`set_batch_size(batch_size: int) -> None` ¶

Sets the IRModule's batch-size

Parameters:

Name	Type	Description	Default
`batch_size`	`int`	The desired batch-size for the model	required

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

`partition_for_tensorrt(remove_stacks: bool = True, simplify_batch_matmul: bool = True, simplify_scalar_add: bool = True, remove_no_mac_subgraphs: bool = True) -> None` ¶

Partitions an IRModule for TensorRT optimization by identifying and separating TensorRT-compatible subgraphs from the "main" computational graph.

This function analyzes the computational graph within the given IRModule to identify subgraphs that can be optimized using TensorRT. It then partitions the graph, isolating these TensorRT-compatible subgraphs. The partitioning process ensures that only supported operations are included in these subgraphs, while the remaining graph continues to be handled by the default execution engine.

Note: This method will flag an IRModule so that the IRModule.is_tensorrtflag will be true. To undo the partitioning, use the IRModule.inline_partitions() method.

Parameters:

Name	Type	Description	Default
`remove_stacks`	`bool`	Flag to run passes that remove `stack` operators by re-expressing them in different forms. Currently `stack` operators are not optimized by TensorRT compilation. Defaults to True.	`True`
`simplify_batch_matmul`	`bool`	Flag to run a pass that will simplify "static" `nn.batch_matmul` operators. This pass is to avoid an error that can arise in TensorRT compilation. Defaults to True.	`True`
`simplify_scalar_add`	`bool`	Flag to run a pass that will convert the values of scalar adds into "broadcasted" tensors. This is to circumvent the limitation of the TVM-TensorRT bridge, which doesn't accept scalar adds in its subgraphs. Defaults to True.	`True`
`remove_no_mac_subgraphs`	`bool`	Flag to remove any subgraphs that don't contain multiply-accumulate (MAC) operators. Defaults to True.	`True`

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

`split_tensors(force: bool = False) -> None` ¶

Performs tensor splitting on the weight tensors of convolution layers.

This process divides a large weight tensor into two smaller tensors, which can facilitate parallel computation and aid in maintaining or improving quantization accuracy by allowing for more fine-grained parameterization over the quantization of different parts of the tensor.

Note: This method will flag and IRModule so that the IRModule.is_split_tensors` flag will be true.

Parameters:

Name	Type	Description	Default
`force`	`bool`	If False, the operation will raise a ValueError when previously captured calibration data is detected. When True, the operation will wipe previously recorded calibration data. Default is False.	`False`

`inline_partitions() -> None` ¶

Undoes any partitions in IRModule and inlines all partitions back into the "main" computational graph, i.e. the inverse operation of IRModule.partition_for_tensorrt()`.

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

`compile(target: Union[str, Dict[str, Any]] = 'llvm', host: Optional[Union[str, Dict[str, Any]]] = None, output_path: Optional[Union[str, Path]] = './compile_output', opt_level: int = 3, set_float16: bool = False, set_channel_layout: Optional[str] = None, export_relay: bool = False, export_metadata: bool = False, force_overwrite: bool = False, uuid: Optional[str] = None, encrypt_password: Optional[str] = None) -> None` ¶

Compiles the model for a specified target with various configuration options.

This method compiles the model for a given target, which can be a string or a dictionary specifying the target attributes. The compilation can be customized through various parameters, including optimization level and data type settings.

Parameters:

Name	Type	Description	Default
`target`	`Union[str, Dict[str, Any]]`	Can be one of a literal target string, a target tag (pre-defined target alias), a json string describing, a configuration, or a dictionary of configuration options. When using a dictionary or json string to configure target, the possible values are: kind : str (required) Which codegen path to use, for example "llvm" or "cuda". keys : List of str (optional) A set of strategies that can be dispatched to. When using "kind=opencl" for example, one could set keys to ["mali", "opencl", "gpu"]. device : str (optional) A single key that corresponds to the actual device being run on. This will be effectively appended to the keys. libs : List of str (optional) The set of external libraries to use. For example ["cblas", "mkl"]. system-lib : bool (optional) If True, build a module that contains self registered functions. Useful for environments where dynamic loading like dlopen is banned. mcpu : str (optional) The specific cpu being run on. Serves only as an annotation. model : str (optional) An annotation indicating what model a workload came from. runtime : str (optional) An annotation indicating which runtime to use with a workload. mtriple : str (optional) The llvm triplet describing the target, for example "arm64-linux-android". mattr : List of str (optional) The llvm features to compile with, for example ["+avx512f", "+mmx"]. mfloat-abi : str (optional) An llvm setting that is one of "hard" or "soft" indicating whether to use hardware or software floating-point operations. mabi : str (optional) An llvm setting. Generate code for the specified ABI, for example "lp64d". host : Union[str, Dict[str, Any]] (optional) Description for target host. Can be recursive. Similar to target.	`'llvm'`
`host`	`Optional[Union[str, Dict[str, Any]]]`	Similar to target but for target host. Can be one of a literal target host string, a target tag (pre-defined target alias), a json string describing a configuration, or a dictionary of configuration options. When using a dictionary or json string to configure target, the possible values are same as target.	`None`
`output_path`	`Optional[Union[str, Path]]`	The path to save the compiled output, `./compile_output` by default.	`'./compile_output'`
`opt_level`	`int`	Optimization level, ranging from 0 to 4. Larger numbers, correspond with more aggressive compilation optimizations. Default is 3.	`3`
`set_float16`	`bool`	If True, enables Float16 data type for all operators permitted. Default is False. This option is ignored for TensorRT compilation.	`False`
`set_channel_layout`	`Optional[str]`	Optional specification of the channel layout ("first", "last"), defaults to no changing of layout if None. This option is ignored for TensorRT compilation and the defaults to channel-first.	`None`
`export_relay`	`bool`	If True, exports the Relay text representation of the model. Default is False.	`False`
`export_metadata`	`bool`	If True, exports the metadata JSON of the model as a text file. Default is False.	`False`
`force_overwrite`	`bool`	If True, the method will overwrite if the provided output path already exists. A ValueError will be thrown if False and the output path already exists. Default is False.	`False`
`uuid`	`Optional[str]`	Optional specification of a uuid from the user when the model needs to have a unique identifier that will be set by the user, when this value is not set, the uuid will be a randomely generated one.	`None`
`encrypt_password`	`Optional[str]`	Optional specification of a password if is desirable to have the model encrypted. As an output, there will be the model file and the key.	`None`

Returns:

Name	Type	Description
`None`	`None`	The method operates in place.

`calibrate(calib_data: Iterable[Any], reset: bool = True, use_cuda: bool = True) -> None` ¶

Calibrates the model by tracking intermediate layer statistics.

This method collects statistics from intermediate layers of the model using the provided calibration dataset. These statistics are used for deriving quantization parameters in a subsequent quantization process. It's essential that the calibration data is representative of the model's expected real-world inputs.

Note: Ensure that the calibration data is in the form of numpy arrays and has undergone the necessary pre-processing steps required for the model.

Parameters:

Name	Type	Description	Default
`calib_data`	`Iterable[Any]`	An iterable of data samples for calibration. The samples should be in a format compatible with the model's input requirements. Inspect the IRModule.input_shapes`and IRModule.input_dtypes` for details. For multiple inputs, each set of inputs should be an iterable of numpy arrays, e.g. a list or tuple of numpy arrays.	required
`reset`	`bool`	If True, any previous calibration data is cleared before new data is processed. Defaults to True. Additionally, this argument defaults to True when a TensorRT-partitioned model is detected.	`True`
`use_cuda`	`bool`	If True, Forge will utilize CUDA devices for calibration if GPUs can be found. Operation will fall back to CPU if GPUs are not found. Default is True.	`True`

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

Raises:

Type	Description
`ValueError`	If `calib_data` is not in the correct format.

`quantize(activation_dtype: str = 'int8', kernel_dtype: Optional[str] = None, bias_dtype: Optional[str] = None, per_channel: bool = False, calib_method: str = 'average', quant_type: str = 'any') -> None` ¶

Applies quantization to the model with specified parameters.

This method quantizes the model's activations, kernels, and biases to the specified data types. If kernel_dtype and bias_dtype are None, they default to the activation_dtype. The quantization can be "static" (requiring prior calibration), "dynamic" (no calibration needed), or "any" (prioritizing static if possible).

Note: When using "static" quant_type, ensure calibration is performed beforehand or provide calib_data for calibration. If split_tensors is enabled, existing calibration data is discarded due to graph changes, necessitating fresh calib_data.

Parameters:

Name	Type	Description	Default
`activation_dtype`	`str`	Data type for activations ("int8", "uint8"), default is "int8".	`'int8'`
`kernel_dtype`	`Optional[str]`	Data type for kernels ("int8", "uint8"), defaults to `activation_dtype` if None.	`None`
`bias_dtype`	`Optional[str]`	Data type for biases in `nn.bias_add` operators. Can be set to match `activation_dtype` or "int32", defaults to `activation_dtype` if None.	`None`
`per_channel`	`bool`	If True, performs per-channel quantization on kernels. Default is False.	`False`
`calib_method`	`str`	Method for calibration ("average", "entropy", "minmax", "percentile"), default is "average". Overview of calibration methods: "average" - computed average of the min-max extrema across calibration data, "entropy" - distribution-based maximization of entropy of quantized values, "minmax" - absolute most extreme min-max values across calibration data, "percentile" - computed 99-th percentile cut-offs across calibration data.	`'average'`
`quant_type`	`str`	Type of quantization ("static", "dynamic", "any"), default is "any".	`'any'`

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

`eq(other) -> bool` ¶

Compares for equality of hash

`hash() -> int` ¶

Structural hash of the IRModule

`iter() -> Iterator` ¶

Iterator of the underlying Node objects in evaluation order

IRModule Class¶

forge.IRModule¶

forge.IRModule(mod: Relay.IRModule.obj, params: Optional[Dict[str, Union[Relay.NDArray.obj, np.ndarray]]] = None, inline_partitions: bool = False, fold_constants: bool = True, fold_batch_norms: bool = True) ¶

graphs: Dict[str, Graph] = {gv.name_hint: Graph(relay_expr=func_expr, params=params, sink_name=gv.name_hint)for (gv, func_expr) in mod.functions.items()} class-attribute instance-attribute ¶

fingerprint: str property ¶

params: Dict[str, Relay.NDArray.obj] property ¶

input_count: int property ¶

input_shapes: List[Sequence[int]] property ¶

input_dtypes: List[str] property ¶

input_nodes: List[Node] property ¶

output_count: int property ¶

output_shapes: List[Sequence[int]] property ¶

output_dtypes: List[Sequence[int]] property ¶

output_node: Node property ¶

graph_count: int property ¶

subgraph_count: int property ¶

operators: Dict[str, int] property ¶

mod: Relay.IRModule.obj property ¶

typed_mod: Relay.IRModule.obj property ¶

main: Graph property ¶

is_tensorrt: bool property ¶

is_calibrated: bool property ¶

is_quantized: bool property ¶

is_split_tensors: bool property ¶

copy() -> IRModule ¶

get_inference_function(target: str = 'llvm') -> Callable ¶

set_batch_size(batch_size: int) -> None ¶

partition_for_tensorrt(remove_stacks: bool = True, simplify_batch_matmul: bool = True, simplify_scalar_add: bool = True, remove_no_mac_subgraphs: bool = True) -> None ¶

split_tensors(force: bool = False) -> None ¶

inline_partitions() -> None ¶

calibrate(calib_data: Iterable[Any], reset: bool = True, use_cuda: bool = True) -> None ¶

quantize(activation_dtype: str = 'int8', kernel_dtype: Optional[str] = None, bias_dtype: Optional[str] = None, per_channel: bool = False, calib_method: str = 'average', quant_type: str = 'any') -> None ¶

__eq__(other) -> bool ¶

__hash__() -> int ¶

__iter__() -> Iterator ¶

`forge.IRModule(mod: Relay.IRModule.obj, params: Optional[Dict[str, Union[Relay.NDArray.obj, np.ndarray]]] = None, inline_partitions: bool = False, fold_constants: bool = True, fold_batch_norms: bool = True)` ¶

`graphs: Dict[str, Graph] = {gv.name_hint: Graph(relay_expr=func_expr, params=params, sink_name=gv.name_hint)for (gv, func_expr) in mod.functions.items()}` `class-attribute` `instance-attribute` ¶

`fingerprint: str` `property` ¶

`params: Dict[str, Relay.NDArray.obj]` `property` ¶

`input_count: int` `property` ¶

`input_shapes: List[Sequence[int]]` `property` ¶

`input_dtypes: List[str]` `property` ¶

`input_nodes: List[Node]` `property` ¶

`output_count: int` `property` ¶

`output_shapes: List[Sequence[int]]` `property` ¶

`output_dtypes: List[Sequence[int]]` `property` ¶

`output_node: Node` `property` ¶

`graph_count: int` `property` ¶

`subgraph_count: int` `property` ¶

`operators: Dict[str, int]` `property` ¶

`mod: Relay.IRModule.obj` `property` ¶

`typed_mod: Relay.IRModule.obj` `property` ¶

`main: Graph` `property` ¶

`is_tensorrt: bool` `property` ¶

`is_calibrated: bool` `property` ¶

`is_quantized: bool` `property` ¶

`is_split_tensors: bool` `property` ¶

`copy() -> IRModule` ¶

`get_inference_function(target: str = 'llvm') -> Callable` ¶

`set_batch_size(batch_size: int) -> None` ¶

`partition_for_tensorrt(remove_stacks: bool = True, simplify_batch_matmul: bool = True, simplify_scalar_add: bool = True, remove_no_mac_subgraphs: bool = True) -> None` ¶

`split_tensors(force: bool = False) -> None` ¶

`inline_partitions() -> None` ¶

`calibrate(calib_data: Iterable[Any], reset: bool = True, use_cuda: bool = True) -> None` ¶

`quantize(activation_dtype: str = 'int8', kernel_dtype: Optional[str] = None, bias_dtype: Optional[str] = None, per_channel: bool = False, calib_method: str = 'average', quant_type: str = 'any') -> None` ¶

`eq(other) -> bool` ¶

`hash() -> int` ¶

`iter() -> Iterator` ¶