{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bring Your Own Data (BYOD)\n",
    "\n",
    "In this guide, we’ll walk you through how to integrate your own dataset into a LEIP Design recipe. We will start by reviewing the datasets we offer, followed by a step-by-step demonstration using the Road Sign Detection dataset from Kaggle. These steps can be applied to any dataset you choose to work with.\n",
    "\n",
    "This guide specifically focuses on **object detection** and is designed for integrating **detection datasets** into a LEIP Design recipe.\n",
    "\n",
    "First, we generate our pantry and create a recipe to work with, as shown in the [Getting Started tutorial](https://docs.latentai.io/leip/design/latest/notebooks/GettingStartedwithLEIPDesign/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "import leip_recipe_designer as rd\n",
    "\n",
    "# Define the workspace path\n",
    "workspace = Path('./workspace')\n",
    "\n",
    "# Build the pantry (do not rebuild if it already exists)\n",
    "pantry = rd.Pantry.build(workspace / \"./my_combined_pantry/\", force_rebuild=False)\n",
    "recipe = rd.create.from_recipe_id('44702', pantry=pantry, allow_upgrade=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before bringing your own dataset, you might want to explore the object detection datasets we provide. These can be a good starting point if you’re looking for a quick test setup."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "> Help: Dataset. \n",
       "> Ingredients that fit:\n",
       "  Index  Parameter                                                                                       Type                                Version    UUID\n",
       "      0  BYOD from Url - PASCAL format                                                                   data_generator.vision.detection.2d  1.0.0      da9ec6daa3a287173c17307eb727a033c356677976662fca84c1d0697c5960ef\n",
       "      1  BYOD from Url - COCO format                                                                     data_generator.vision.detection.2d  1.0.0      0b9bca30d0fe77ee7734e56dc7291272c7e69894d3419ac64f70fe32add19f32\n",
       "      2  BYOD from Url - YOLO format                                                                     data_generator.vision.detection.2d  1.0.0      104f9721404c4653f2eddbe04023b4858a72aff3cec2937e05c714ac1d2d91c8\n",
       "      3  BYOD from Url - KITTI format                                                                    data_generator.vision.detection.2d  1.0.0      6305d7f53f4ae0bc7b669cb105172621f25daaed3934a96063a02dd489ebbf1a\n",
       "      4  BYOD - PASCAL format                                                                            data_generator.vision.detection.2d  1.0.0      26c7b7a439669ce524a8d1128c51df0385bb09460eb26f174e61f5e0dd43ba8a\n",
       "      5  BYOD - COCO format                                                                              data_generator.vision.detection.2d  1.0.0      f1b19f142448b17b02883bcc1663032ad8240928816f4f7db18ff86ba5651238\n",
       "      6  BYOD - YOLO format                                                                              data_generator.vision.detection.2d  1.0.0      cac615b7fb149b5ea1a270c4782eb59202d40e20f77f3caa45e6ce40b221394c\n",
       "      7  BYOD - KITTI format                                                                             data_generator.vision.detection.2d  1.0.0      2eeaaf51d0243838e482dc8f84d8ca3f9c180121c4e281f1758bca5e586b10a5\n",
       "      8  VAID (Overhead Car Detection)                                                                   data_generator.vision.detection.2d  1.0.0      7a04de11614146936318448b40446d2dde40b8ddf2bd695e968778ab35ff36b2\n",
       "      9  Pothole Detection (data/sets/kaggle/detection/andrewmvd-pothole-detection)                      data_generator.vision.detection.2d  1.0.0      7965dc0775eedd4191928e5a9a376ab9617bc72fa0c032459fe2dfef5c90738c\n",
       "     10  COCO                                                                                            data_generator.vision.detection.2d  1.0.0      6e5bcaa5ddb35c0dd91cead96615c5859ea6f207cd0c6a121cc49cf9e85852db\n",
       "     11  Smoke (data/sets/url/detection/smoke-pascal-like)                                               data_generator.vision.detection.2d  1.0.0      64b984853aefcba0f0bad3315f83bdd034215d6c8f12e833d8242a0ea3fb0d36\n",
       "     12  Fire and smoke (data/sets/url/detection/fire-and-smoke-coco-like)                               data_generator.vision.detection.2d  1.0.0      f95a890cc9c2102d0ae2268e2dd771ee6da23608650f0627d2ec459f1b23fc44\n",
       "     13  PASCAL VOC                                                                                      data_generator.vision.detection.2d  1.0.0      d91692760a3fa454bc5aad0ae247fc320c97ad5d0f57fa3d334bac23298bd953\n",
       "     14  COCO Car Detection (data/sets/kaggle/detection/coco-car-dataset)                                data_generator.vision.detection.2d  1.0.0      a7c5c41ec12a9c70f6b8264226968b7498bc9b5d2df8914b0aa5518fd3b661c1\n",
       "     15  Dials and gauges (data/sets/url/detection/dials-and-gauges-pascal-like)                         data_generator.vision.detection.2d  1.0.0      f4271f5edda5ead883767c87bd6564b1eccf4b9021ce36633ecf3d156956923e\n",
       "     16  Dice, Car, Battery Detection (data/sets/kaggle/detection/kitti-dice-car-battery-detection-new)  data_generator.vision.detection.2d  1.0.0      0a0fdb2af87d607288594bc486b767480ac88fab9c04323faa5d74bb15ad9593\n",
       "     17  Ship Detection (data/sets/kaggle/detection/pascal-ship-dataset)                                 data_generator.vision.detection.2d  1.0.0      6ab1c630fc6f71b72489354a10ca8fafe65570334a71f0a7734bf56325d0cc58\n",
       "     18  Zenodo wheat (data/sets/url/detection/zenodo-wheat)                                             data_generator.vision.detection.2d  1.0.0      d9ab9d814df4321828ec0336c3f9ae926362eb977fa07d74871f1b054179d860\n",
       "     19  Attach FiftyOne Dataset                                                                         data_generator.vision.detection.2d  1.0.0      7a12bd14e3ffe82142f62ddae46c9f8bdb0d92c6c0a9fb4c3ae5d50a49fd6908\n",
       "     20  Car Detection (data/sets/kaggle/detection/sshikamaru-car-object-detection)                      data_generator.vision.detection.2d  1.0.0      3ea6d7e169e173e42987169b929a6296175d18f891523463f8695ceabb1d3763\n",
       "     21  Face Mask Detection (data/sets/kaggle/detection/andrewmvd-face-mask-detection)                  data_generator.vision.detection.2d  1.0.0      f5737272f8595201f098a40490631b89416f20a3a9c6112ec3421405a698ea2b\n",
       "     22  Dice Detection (data/sets/kaggle/detection/pascal-dice-dataset)                                 data_generator.vision.detection.2d  1.0.0      ee99df1f35e58cf28bf78049eda3da6a42df44a507a86d23cf66890d05b934ef\n",
       "     23  Fruits Detection (data/sets/kaggle/detection/mbkinaci-fruit-images-for-object-detection)        data_generator.vision.detection.2d  1.0.0      8ecee62ebce897e400bb6d78e57e8724e83e943ba41b5687b6600e0a191cbe99\n",
       "     24  Smoke Detection (data/sets/kaggle/detection/pascal-smoke-dataset)                               data_generator.vision.detection.2d  1.0.0      f0fe4e825584c227287d20e0b3b870aa238e31af4e269cbb28183402d2d6245a\n",
       "     25  Composite - Mosaic                                                                              data_generator.vision.detection.2d  1.3.0      f2ac92f31f08c8219590ea89a37d9b9b1bc21160af22a70b524b70c5ef80e140\n",
       "     26  Composite - Random subset                                                                       data_generator.vision.detection.2d  1.1.0      5c1fb1649a698eeaa89b459ed07646065b56b59f7e2c932ee722a9cc0039a9af\n",
       "     27  Composite - Data joiner                                                                         data_generator.vision.detection.2d  1.1.0      413dbe101b5a31107994d52fe45ad2ed7fd55623df05c4cff640227392512ff8\n",
       "     28  Composite - Matting                                                                             data_generator.vision.detection.2d  1.2.0      495590fd21e17c66705e69820f4416bf1c46bf08e01c6c47d3713f054b4f0b58\n",
       "     29  Composite - Class selector                                                                      data_generator.vision.detection.2d  1.1.0      d11aa4d1d87054674999de3bafb22370582bb73e179f0474ec8b9d25b201793f\n",
       "     30  Unlabeled Dataset                                                                               data_generator.vision.detection.2d  1.0.2      8f26f69bbdbdd5c8bd8d8c960d9449c5a415507383dc1b27a545f37c97da49f9\n",
       "> Use recipe.assign_ingredients('data_generator', ingredient_name) to add it to the recipe.\n",
       "> Or alternatively, use recipe['data_generator'] = ingredient_id."
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recipe.options(\"data_generator\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Assigning Ingredients to Your Recipe:**\n",
    "\n",
    "To assign a data generator to your recipe, use the `assign_ingredients` method. This approach is recommended when building a recipe from scratch using one of our provided datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'choice_id': 'a7c5c41ec12a9c70f6b8264226968b7498bc9b5d2df8914b0aa5518fd3b661c1',\n",
       "  'choice_name': 'COCO Car Detection (data/sets/kaggle/detection/coco-car-dataset)',\n",
       "  'synonym': 'data_generator',\n",
       "  'parent': 'Basic Adaptor',\n",
       "  'slot': 'slot:module.dataset_generator',\n",
       "  'path': ['slot:data', 'slot:module.dataset_generator']}]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recipe.assign_ingredients('data_generator', \"COCO Car Detection\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more details on initializing an empty recipe for your tasks, refer to the [Recipe Creators documentation](http://docs.latentai.io/leip/design/latest/content/reference/recipe_creators/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note:** The `assign_ingredients` function is best used when creating a new recipe, as it clears and initializes preprocessing steps such as augmentations from scratch.\n",
    "\n",
    "If you are modifying one of our pre-validated \"golden recipes\" and wish to retain advanced augmentations like mosaicing that contribute to optimal performance, use the `replace_data_generator` method instead:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Skipped downloading goldenrecipedb with name \"xval_det\" and variant \"Xval0.3\" (0), as it already exists.\n",
      "This is the Cross-validation volume. Available methods are- \n",
      "get_golden_df \n",
      "describe_table\n"
     ]
    }
   ],
   "source": [
    "recipe = rd.create.from_recipe_id('44702', pantry=pantry, allow_upgrade=True)\n",
    "data = rd.helpers.data.get_data_generator_by_name(pantry=pantry, regex_ingredient_name=\"COCO Car Detection\")\n",
    "rd.helpers.data.replace_data_generator(recipe, data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example Dataset: Road Sign Detection (Kaggle)\n",
    "The steps below will help you retrieve and set up the Road Sign Detection dataset. You can download it directly from [Kaggle](https://www.kaggle.com/datasets/andrewmvd/road-sign-detection).\n",
    "\n",
    "#### Steps to Use Your Own Dataset:\n",
    "1. **Download your dataset**:\n",
    "   - If using the Road Sign Detection dataset from Kaggle, navigate to the dataset page, log in with your Kaggle credentials, and click \"Download.\"\n",
    "   - Unzip the downloaded file and place the dataset in a local directory.\n",
    "   \n",
    "2. **Set the `root_path` for your dataset**:\n",
    "   - After unzipping, set the `root_path` in your code to point to the folder containing your dataset.\n",
    "   - Example for the Road Sign Detection dataset:\n",
    "     ```python\n",
    "     root_path = \"/path/to/road_sign_detection_dataset/\"\n",
    "     ```\n",
    "\n",
    "3. **Supported Dataset Formats**:\n",
    "   - LEIP Design supports various formats such as YOLO, COCO, and PASCAL.\n",
    "   - You can also [integrate datasets from FiftyOne](https://docs.latentai.io/leip/design/latest/content/reference/data_helpers/#leip_recipe_designer.helpers.data.attach_fiftyone_data_generator).\n",
    "\n",
    "   If your dataset is in any of these formats, you can easily ingest it into LEIP Design using the provided helpers.\n",
    "\n",
    "4. **Ingest the Dataset into the Recipe**:\n",
    "   - Once the dataset is prepared, you can attach it to the recipe using our [data ingestion helpers](https://docs.latentai.io/leip/design/latest/content/reference/data_helpers/#format-specific-data-generators):\n",
    "     ```python\n",
    "     data = rd.helpers.data.new_pascal_data_generator() # fill based on docs\n",
    "     rd.helpers.data.replace_data_generator(recipe, data)\n",
    "     ```\n",
    "\n",
    "### BYOD Example: Road Sign Detection Dataset\n",
    "For convenience, if you are using the Road Sign Detection dataset, you can mirror it by running the following command:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a new data generator for the Pascal VOC dataset format - ensure root_path is set\n",
    "data = rd.helpers.data.new_pascal_data_generator(\n",
    "    pantry=pantry,\n",
    "    root_path=\"${paths.cache_dir}/road-sign-data\",\n",
    "    images_dir=\"images\",\n",
    "    annotations_dir=\"annotations\",\n",
    "    nclasses=4,\n",
    "    is_split=False,\n",
    "    trainval_split_ratio=0.80,\n",
    "    trainval_split_seed=42,\n",
    "    dataset_name=\"road-sign-data\",\n",
    "    download_url=\"https://s3.us-west-1.amazonaws.com/leip-showcase.latentai.io/recipes/andrewmvd_road-sign-detection.zip\" # skip if pre-downloaded\n",
    ")\n",
    "\n",
    "rd.helpers.data.replace_data_generator(recipe, data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recipe[\"data_generator\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Additional Resources:\n",
    "- [Supported Dataset Formats](https://docs.latentai.io/leip/design/latest/content/reference/data_helpers/#format-specific-data-generators)\n",
    "- [FiftyOne Integration](https://docs.latentai.io/leip/design/latest/content/reference/data_helpers/#leip_recipe_designer.helpers.data.attach_fiftyone_data_generator)\n",
    "\n",
    "Once your dataset is loaded, you can proceed to training the recipe just like any other dataset supported in LEIP Design, as shown in the [Getting Started tutorial](https://docs.latentai.io/leip/design/latest/notebooks/GettingStartedwithLEIPDesign/)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "main",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.20"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
