Optimize OTX Data Augmentations pipeline

Created by: kprokofi

📄 Description

Current Problem:

  1. Mixed augmentation pipelines: OTX currently uses a combination of torchvision and OpenMMLab augmentations (some self-implemented in the repo). Later in the pipeline, operations rely on NumPy images, while torchvision expects PyTorch tensors. This causes constant conversion between tensors and NumPy arrays, introducing a performance bottleneck that slows down training.

  2. Inconsistent interfaces: Parameter names, formats, and APIs differ across the various augmentation implementations, making the codebase harder to maintain and extend.

  3. Redundant self-implemented augmentations: Several augmentations are re-implemented locally in OTX, even though robust third-party solutions exist and are actively maintained.


Proposed Solution:

Evaluate Kornia as the primary augmentation library for OTX. Kornia is a differentiable computer vision library built on top of PyTorch, with an extensive set of augmentation operators.

Key benefits of Kornia:

  • 🟢 PyTorch-first design: All augmentations operate directly on PyTorch tensors, removing unnecessary conversions from/to NumPy and improving training efficiency.
  • 🟢 Rich functionality: Provides a wide range of augmentations, including geometric, color, and intensity transformations, as well as advanced ones (e.g., motion blur, random perspective, cutout).
  • 🟢 Differentiable & GPU-accelerated: Augmentations are differentiable, making them suitable for integration with modern deep learning pipelines, and can be executed efficiently on GPUs.
  • 🟢 Unified API & consistency: Standardized function signatures simplify maintainability and reduce code complexity.
  • 🟢 Extended support: Handles not only images but also masks, bounding boxes, and keypoints, aligning with the requirements of detection and segmentation tasks.

Benchmarking Plan (Kornia vs torchvision.v2):

  • Compare throughput (images/sec) during training with augmentation-heavy pipelines.
  • Measure GPU utilization and memory footprint when using Kornia vs torchvision v2.
  • Evaluate coverage of transformations (ensure Kornia includes all augmentations currently used in OTX).
  • Validate annotation consistency (masks, bounding boxes, keypoints remain synchronized after augmentation).

🎯 Objective

  • Unify OTX augmentation pipeline under a PyTorch-first approach.
  • Improve training performance by reducing tensor/NumPy conversions.
  • Increase maintainability and clarity of augmentation code by eliminating redundant implementations.
  • Provide benchmark results to decide between Kornia and torchvision v2.