Guide to Introspection with Forge¶

This guide will show how to introspect on the many properties of the model with Forge's intermediate representation, the forge.RelayModule. There are many useful properties of Forge's RelayModule that can aid the engineer, scientist, or developer.

Load a Model

Following the guide on loading let's load a model into Forge's IR.

RelayONNX

import forge
import onnx

onnx_model = onnx.load("path/to/model.onnx")
ir = forge.RelayModule.from_onnx(onnx_model)

import forge
import onnx

onnx_model = onnx.load("path/to/model.onnx")
ir = forge.ONNXModule(onnx_model)

Modules in Forge¶

Forge supports two primary backend intermediate representations (IRs): Relay and ONNX. Relay, from the TVM project, is a flexible, framework-agnostic IR designed for optimizing and compiling deep learning models. ONNX (Open Neural Network Exchange) is a widely adopted standard for representing machine learning models across different frameworks. Forge provides dedicated modules—forge.RelayModule and forge.ONNXModule—to work seamlessly with both IRs, allowing users to introspect, manipulate, and compile models regardless of their original format.

What is a RelayModule?¶

The forge.RelayModule is Forge's intermediate representation module. It is a framework-agnostic representation of the model that provides the compiler a generalized and standardized abstraction that captures the algorithm of the model. It describes what the algorithm is, and not how a device ought to execute the algorithm. Because Forge is built atop the open-source TVM project, we adopt their intermediate representation language, Relay - TVM's Intermediate Representation. Forge extends TVM by providing a graph backend and a refined API.

Distinction Between Forge and Relay

One should note that there is a distinction between the 'Forge RelayModule' and the 'Relay RelayModule'. The Forge RelayModule is an object that wraps the Relay RelayModule, TVM's native intermediate representation. The Forge RelayModule aims to provide a one-to-one parallel to the underlying Relay RelayModule.

What is an ONNXModule?¶

The forge.ONNXModule is Forge's intermediate representation module for ONNX models. ONNX (Open Neural Network Exchange) is an open standard for representing machine learning models, enabling interoperability between different frameworks. The forge.ONNXModule provides a standardized, framework-agnostic abstraction of the model, capturing its computational graph and metadata. This allows Forge to introspect, manipulate, and optimize ONNX models using a unified API, regardless of their original source. Forge extends the capabilities of ONNX and ONNX Runtime by offering a refined API and integration with its graph backend for further analysis and optimization.

Distinction Between Forge and ONNX

The 'Forge ONNXModule' is a wrapper around the native ONNX model object. While the ONNX model defines the structure and parameters of the neural network, the Forge ONNXModule provides additional utilities and a consistent interface for working with ONNX models within the Forge ecosystem. It also wraps around ONNX Runtime to provide simplified calibration and quantization APIs, making it easier to optimize and deploy ONNX models.

Properties of an Module¶

The readable properties of a forge.RelayModule and forge.ONNXModule.

See the Intermediate Representation¶

The Relay RelayModule can be referenced with the class's mod and typed_mod properties. In a notebook cell, these calls will display the Relay graph as text. Don't be concerned with understanding all the details of the output for now.

ir.mod  # Relay RelayModule
ir.typed_mod  # Relay RelayModule w/ static-typing

Similarly, you can display the ONNX graph using the module's mod property:

ir.mod  # ONNX ModelProto

This will show the underlying ONNX graph structure, allowing you to inspect the model's computational graph and metadata directly.

See the Operators (only for RelayModule)¶

It's simple to get a count of all the distinct operators within a model.

ir.operators  # Dict[str, int]

Get Input and Output Information¶

There are a handful of properties that provide quick access to the inputs and outputs of a model.

# input properties
ir.input_count  # int
ir.input_shapes  # List[Tuple[int, ...]]
ir.input_dtypes  # List[str]

# output properties
ir.output_count  # int
ir.output_shapes  # List[Tuple[int, ...]]
ir.output_dtypes  # List[str]

Identify your Model (only for RelayModule)¶

Models can be tricky to identify. Sometimes two files may be duplicates, but how can you be sure? In Forge, there are two ways to distinguish a model's identity.

ir.fingerprint  # str

The fingerprint property is a deterministic hashing of a model's Relay structure and weights. One can ascertain that two Forge RelayModule's with matching fingerprints are completely identical.

hash(ir)  # int

The hashing a Forge RelayModule with the hash() function is a deterministic hashing of a model's Relay structure (excluding weights), i.e. two models of identical structures, trained on different data sets will yield matching hashes (but different fingerprints).

Hashing Uniqueness

Both the fingerprint and hash features are reliant upon hashing. Hashing does not guarantee uniqueness, but it is highly improbable for different models to derive matching hashes.

Inference Debugging¶

One may want to quickly get a Python callable that emulates the inference of the underlying model (especially in situations of manipulating the underlying graph). The inference function expects NumPy arrays as positional arguments. The inference function is not an optimized compilation of the model. It should only be used as a tool for debugging and validating accuracy.

func = ir.get_inference_function()
func(input_data)  # func(input0, input1, ..., inputN) for multiple inputs

Partitioning the RelayModule¶

Partitioning a Relay graph for the purpose of compilation with different compiler backends or hardware is done to leverage the strengths of various environments for optimal performance. Essentially, it involves:

Dividing the Graph: Breaking down the computational graph of a model into segments or partitions.
Targeted Execution: Assigning these partitions to different compiler backends or hardware units (like CPUs, GPUs, TPUs) that are best suited for executing them.
Performance Optimization: This approach optimizes the overall performance by ensuring that each part of the model runs on the most efficient platform for its specific type of computation.

In essence, it's about matching different parts of the model with the most effective resources available for their execution.