{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quantize and Compile an ONNX Model\n",
    "\n",
    "Forge can ingest transformer models through its [`ONNX` backend](https://docs.latentai.io/leip/optimize/latest/content/api/onnx/irmodule/), which is useful for leveraging state-of-the-art [ONNX models](https://onnxruntime.ai/docs/performance/model-optimizations/ort-format-models.html) and [execution providers](https://onnxruntime.ai/docs/execution-providers/). This tutorial will guide you through the steps required to quantize a float32 (FP32) [DETR model from Hugging Face](https://huggingface.co/facebook/detr-resnet-50) to run with integer8 (INT8) precision.\n",
    "\n",
    "To learn how to use the Forge TVM backend, see the [Forge TVM tutorial](https://docs.latentai.io/leip/optimize/latest/content/notebooks/BYOMwithForgeTutorialTVM/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "First, let’s ensure that your environment is set up correctly.\n",
    "\n",
    "You have two options:\n",
    "\n",
    "1. Set up a Docker container.\n",
    "2. Create a conda environment.\n",
    "\n",
    "Follow the [installation guide](https://docs.latentai.io/leip/optimize/latest/content/getting-started/install/) for step-by-step instructions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To run this tutorial, you’ll need the following Python packages:\n",
    "\n",
    "* torch\n",
    "* torchvision\n",
    "* huggingface_hub\n",
    "\n",
    "Let’s ensure these dependencies are installed in your environment.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install torch==2.4.1 torchvision==0.19.1 --extra-index-url https://download.pytorch.org/whl/cu121\n",
    "!pip install huggingface_hub colorama optimum timm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import urllib.request\n",
    "import zipfile\n",
    "import numpy as np\n",
    "\n",
    "from optimum.exporters.onnx import main_export\n",
    "\n",
    "import forge\n",
    "\n",
    "from torch.utils.data import Dataset\n",
    "from torchvision import transforms\n",
    "from PIL import Image"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model and Dataset Setup\n",
    "In this step, we’ll set up the necessary resources for optimization. This includes organizing your working directories, downloading the `DETR` ONNX model from Hugging Face, and getting the COCO dataset to use for quantization.\n",
    "\n",
    "We need to create a clear folder structure to maintain artifacts during the tutorial:\n",
    "\n",
    "- Place the input ONNX model in the `models` directory.\n",
    "- Download the COCO dataset for calibration.\n",
    "- Export the optimized ONNX model to the `optimized_outputs` directory.\n",
    "\n",
    "Run the code below to create the folder structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_path = \"models\"\n",
    "if not os.path.exists(model_path):\n",
    "    os.makedirs(model_path)\n",
    "if not os.path.exists(f\"{model_path}/detr_resnet50/model.onnx\"):\n",
    "    model_id = \"facebook/detr-resnet-50\"\n",
    "    output_dir = f\"{model_path}/detr_resnet50\"\n",
    "    main_export(\n",
    "        model_name_or_path=model_id,\n",
    "        output=output_dir,\n",
    "        task=\"object-detection\",\n",
    "        opset=12,\n",
    "        device=\"cpu\"\n",
    "    )\n",
    "\n",
    "dataset_dir = \"val2017\"\n",
    "if not os.path.exists(dataset_dir):\n",
    "    print(\"Downloading val2017 dataset from COCO. This is required to quantize our model with INT8 precision\")\n",
    "    url = \"http://images.cocodataset.org/zips/val2017.zip\"\n",
    "    file_path = \"val2017.zip\"\n",
    "    urllib.request.urlretrieve(url, file_path)\n",
    "    with zipfile.ZipFile(file_path, 'r') as zip_ref:\n",
    "        zip_ref.extractall(\".\")\n",
    "    \n",
    "optimized_output_dir = \"optimized_outputs\"\n",
    "if not os.path.exists(optimized_output_dir):\n",
    "    os.makedirs(optimized_output_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Model\n",
    "\n",
    "The Forge ONNX backend supports ingesting models either as an `ONNX object` or directly from an `.onnx` file. For more information, consult the [guide on loading models](https://docs.latentai.io/leip/optimize/latest/content/guide/load/).\n",
    "\n",
    "\n",
    "### Important Note:\n",
    "Forge does not support ONNX models exported using `torch.dynamo`. Only models exported from `torch.jit` are supported for ingestion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ir = forge.ONNXModule(f\"{model_path}/detr_resnet50/model.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the model has been successfully ingested as an IR object, we’re ready to perform transformations and quantization on the model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setting Static Inputs\n",
    "Forge supports exporting models only with static inputs. This means input shapes must be defined and remain constant during inference. To do this, you can use the `.set_static_inputs()` method, which takes an input shape dictionary where the keys are input names, and the values are their corresponding shapes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_shape_dict = {\n",
    "    \"pixel_values\":[1, 3, 800, 1333]\n",
    "}\n",
    "ir.set_static_input(input_shape_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quantizing the Model\n",
    "Let's now quantize the model to optimize it for efficient inference. Forge supports two types of quantization:\n",
    "\n",
    "1. Static Quantization: A calibration step is performed where the model runs on a representative dataset. During calibration, we gather the distribution statistics of the activations, which are then used to determine the optimal scaling factors (quantization parameters) for each layer.\n",
    "2. Dynamic Quantization: Weights are quantized at runtime, providing more flexibility but with potentially lower performance.\n",
    "\n",
    "You can find more details and advanced quantization options in the [guide on quantization](https://docs.latentai.io/leip/optimize/latest/content/guide/quantize/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Loading a Calibration Dataset\n",
    "In this tutorial, we’ll run calibration on 20 images from the COCO dataset. In practice, you should use a larger dataset better suited to your specific model.\n",
    "\n",
    "**Pro Tip**: Modify your calibration dataset according to your model or select some images from your validation dataloader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class CustomImageDataset(Dataset):\n",
    "    def __init__(self, img_dir, end_index = 20, transform=None):\n",
    "        self.img_dir = img_dir\n",
    "        self.transform = transform\n",
    "        self.img_labels = [f for f in os.listdir(img_dir) if f.endswith(('jpg', 'jpeg', 'png'))]\n",
    "        self.end_index = end_index if end_index <= len(self.img_labels) else len(self.img_labels)\n",
    "        self.img_labels = self.img_labels[:end_index]\n",
    "    def __len__(self):\n",
    "        return self.end_index\n",
    "\n",
    "    def __getitem__(self, idx):\n",
    "        img_path = os.path.join(self.img_dir, self.img_labels[idx])\n",
    "        image = Image.open(img_path).convert(\"RGB\")\n",
    "        if self.transform:\n",
    "            image = self.transform(image).unsqueeze(0)\n",
    "        return image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "transform = transforms.Compose([\n",
    "    transforms.Resize((800, 1333)),  # Resize the images to a fixed size\n",
    "    transforms.ToTensor(),          # Convert the images to tensors\n",
    "    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize the images\n",
    "])\n",
    "coco_dataset = CustomImageDataset(img_dir=dataset_dir, transform=transform)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calibrating the Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ir.calibrate(coco_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Static Quantization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Forge supports different quantization formats for activations and kernels during static quantization. You can choose between `INT8` and `UINT8` formats based on your model's needs:\n",
    "\n",
    "- Activations: Supported formats are INT8 and UINT8.\n",
    "    - Note: For dynamic quantization, only UINT8 is supported for activations.\n",
    "- Kernels (Weights): Supported formats include INT8 and UINT8.\n",
    "\n",
    "For more advanced options, consult the [guide on quantization](https://docs.latentai.io/leip/optimize/latest/content/guide/quantize/).\n",
    "\n",
    "In this tutorial, we will choose `UINT8` for both activation and kernel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ir.quantize(activation_dtype=\"uint8\", kernel_dtype=\"uint8\", quant_type='static')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exporting the Model\n",
    "\n",
    "After calibrating and quantizing the model, we can export it using Forge for deployment. Forge supports two export options:\n",
    "\n",
    "* TensorRT Export: Recommended if `TensorRT` is available on your target device, as it offers performance optimizations for NVIDIA GPUs. You can use the `is_tensorrt` flag to indicate whether the model should be exported with TensorRT optimizations.\n",
    "* Non-TensorRT Export: A general export for devices where TensorRT is unavailable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "ir.export(f\"{optimized_output_dir}/quantized_model.onnx\", uuid='quantized model', force_overwrite=True)\n",
    "ir.export(f\"{optimized_output_dir}/quantized_trt_model.onnx\", uuid='tensorrt calibrated model', is_tensorrt=True, force_overwrite=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Head over to the [LEIP Deploy documentation](https://docs.latentai.io/leip/deploy/latest/) to learn how to deploy this exported model on your target!"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
