{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Enhance your Machine Learning workflows with LEIP Design’s new data visualization integration !\n",
    "\n",
    "In this tutorial, we'll explore how to streamline machine learning workflows by leveraging our powerful data visualization integrations. \n",
    "\n",
    "We'll demonstrate how to visualize your data, manage data versions, filter noisy samples & seamlessly ingest your refined dataset into your `LEIP recipe`! \n",
    "\n",
    "Let’s dive into building efficient, performance-driven ML pipelines!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Loading the Dataset in FiftyOne\n",
    "\n",
    "In this step:\n",
    "- We load the dataset using FiftyOne from a VOC Detection format directory.\n",
    "- The dataset is split into `train` (75%) and `val` (25%) subsets.\n",
    "- Finally, we launch the FiftyOne web app in port `8882` to visually inspect the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import os\n",
    "import datetime\n",
    "import fiftyone as fo\n",
    "from fiftyone import ViewField as F\n",
    "import fiftyone.utils.random as four\n",
    "from pathlib import Path\n",
    "\n",
    "# Define paths for your dataset\n",
    "data_path = 'path/to/your/dataset' \n",
    "dataset_name = 'surface_defect_detection'\n",
    "images_dir = 'data'\n",
    "labels_dir = 'labels'\n",
    "images_dir = os.path.join(data_path, images_dir)\n",
    "labels_dir = os.path.join(data_path, labels_dir)\n",
    "\n",
    "# Generate a unique name for this dataset using the current timestamp\n",
    "current_time = datetime.datetime.now()\n",
    "new_dataset_name = dataset_name + f\"_{current_time.month}_{current_time.day}_{current_time.hour}_{current_time.minute}_{current_time.second}\"\n",
    "\n",
    "# Specify the dataset type (VOCDetection in this case)\n",
    "dataset_type = fo.types.VOCDetectionDataset\n",
    "\n",
    "# Load the dataset from the specified directory\n",
    "dataset = fo.Dataset.from_dir(\n",
    "    dataset_type=dataset_type,\n",
    "    data_path=images_dir,\n",
    "    labels_path=labels_dir,\n",
    "    name=new_dataset_name\n",
    ")\n",
    "\n",
    "# Split the dataset into training (75%) and validation (25%) sets\n",
    "four.random_split(\n",
    "    dataset,\n",
    "    {\"train\": 0.75, \"val\": 0.25},\n",
    "    seed=42,\n",
    ")\n",
    "\n",
    "# Launch the FiftyOne app to explore the dataset\n",
    "session = fo.launch_app(dataset, port=8882, auto=False)\n",
    "session.open_tab()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's understand the dataset by printing the ground truth annotations of the first sample, including it's labels and bounding boxes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_sample = dataset.first()\n",
    "\n",
    "# Get the ground truth detections\n",
    "ground_truth = first_sample.ground_truth  # Replace 'ground_truth' with the name of your field if different\n",
    "\n",
    "# Print ground truth detections\n",
    "print(\"Ground Truth Detections:\")\n",
    "for detection in ground_truth.detections:\n",
    "    print(f\"Label: {detection.label}, Bounding Box: {detection.bounding_box}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Tailoring the Dataset to your specific needs (Optional)\n",
    "\n",
    "Here, we filter the dataset to include only samples with detections for the classes `scratches` and `patches`. \n",
    "- A new dataset is created with only the filtered samples.\n",
    "- We preserve the `train` and `val` subsets as separate views.\n",
    "- Finally, we inspect the filtered dataset in the FiftyOne app.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the list of classes we are interested in detecting\n",
    "things_i_want_to_detect = ['inclusion', 'patches']\n",
    "\n",
    "# Create a new dataset to store filtered samples\n",
    "custom_dataset_name = f\"{new_dataset_name}_filtered\"\n",
    "custom_dataset = fo.Dataset(custom_dataset_name)\n",
    "\n",
    "# Iterate over samples in the original dataset\n",
    "for sample in dataset:\n",
    "    # Filter detections to keep only the ones matching our target classes\n",
    "    filtered_detections = [\n",
    "        det for det in sample.ground_truth.detections\n",
    "        if det.label in things_i_want_to_detect\n",
    "    ]\n",
    "    \n",
    "    # Skip samples without relevant detections\n",
    "    if not filtered_detections:\n",
    "        continue\n",
    "    \n",
    "    # Create a copy of the sample\n",
    "    new_sample = sample.copy()\n",
    "    \n",
    "    # Overwrite the `ground_truth.detections` with the filtered detections\n",
    "    new_sample[\"ground_truth\"] = fo.Detections(detections=filtered_detections)\n",
    "    \n",
    "    # Add the modified sample to the new dataset\n",
    "    custom_dataset.add_sample(new_sample)\n",
    "\n",
    "# Save the train and validation subsets as views\n",
    "train_view = custom_dataset.match_tags(\"train\")\n",
    "val_view = custom_dataset.match_tags(\"val\")\n",
    "custom_dataset.save_view(\"train_view\", train_view)\n",
    "custom_dataset.save_view(\"val_view\", val_view)\n",
    "\n",
    "# Compute metadata for the filtered dataset\n",
    "custom_dataset.compute_metadata()\n",
    "\n",
    "# Launch the FiftyOne app for the filtered dataset\n",
    "session = fo.launch_app(custom_dataset, port=8882, auto=False)\n",
    "session.open_tab()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_sample = custom_dataset.first()\n",
    "\n",
    "# Get the ground truth detections\n",
    "ground_truth = first_sample.ground_truth  # Replace 'ground_truth' with the name of your field if different\n",
    "\n",
    "# Print ground truth detections\n",
    "print(\"Ground Truth Detections:\")\n",
    "for detection in ground_truth.detections:\n",
    "    print(f\"Label: {detection.label}, Bounding Box: {detection.bounding_box}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Ingesting your customized dataset in LEIP\n",
    "\n",
    "In this step:\n",
    "- We create a basic object detection recipe using LEIP Recipe Designer.\n",
    "- The customized dataset is easily ingested in your recipe via the `attach_fiftyone_data_generator`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import LEIP Recipe Designer and helper functions\n",
    "import leip_recipe_designer as rd\n",
    "from leip_recipe_designer.create import empty_detection_recipe\n",
    "from leip_recipe_designer.helpers.data import replace_data_generator, attach_fiftyone_data_generator\n",
    "\n",
    "# Define the workspace and pantry paths\n",
    "workspace = Path(os.getcwd())\n",
    "pantry = rd.Pantry.build(workspace / \"my_combined_pantry\", force_rebuild=False)\n",
    "\n",
    "# Create an empty recipe for object detection\n",
    "recipe = empty_detection_recipe(pantry=pantry)\n",
    "recipe.fill_empty_recursively()\n",
    "\n",
    "#Attach the filtered FiftyOne dataset as a data generator\n",
    "datagen = attach_fiftyone_data_generator(\n",
    "    pantry=pantry, \n",
    "    dataset_name=custom_dataset_name, \n",
    "    nclasses=2, \n",
    "    groundtruth_field_name='ground_truth',\n",
    ")\n",
    "\n",
    "# Replace the recipe's data generator with the one attached above\n",
    "replace_data_generator(recipe, datagen)\n",
    "\n",
    "# Fill the recipe again to ensure completeness\n",
    "_ = recipe.fill_empty_recursively()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's print the `data_generator` ingredient of the recipe to check if our custom dataset has been ingested successfully"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recipe['data_generator']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congratulations! You have successfully visualized, customized, and ingested your dataset into LEIP in just a few minutes. \n",
    "\n",
    "From here, you can follow the steps shown in the [Getting Started tutorial](https://docs.latentai.io/leip/design/latest/notebooks/GettingStartedwithLEIPDesign/) to train, evaluate, optimize, and deploy your model!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "amj",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
