YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Authors: Hao-Tang Tsui, Chien-Yao Wang, Hong-Yuan Liao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that using the RD significantly improves model performance, achieving more than a 3% increase in mean Average Precision for object detection with less than a 1% increase in model parameters. Beyond YOLO, the RD module improves the effectiveness of 2-stage models and DETR-based architectures, such as Faster R-CNN and Deformable DETR. |
| Researcher Affiliation | Academia | Hao-Tang Tsui, Chien-Yao Wang & Hong-Yuan Mark Liao Institute of Information Science, Academia Sinica EMAIL |
| Pseudocode | Yes | A.6 PSEUDO CODE OF FULL TRAINING PROCESS OF Retriever-Dictionary MODEL Algorithm 1: Train a model with Retriever Dictionary Data: Dataset with images and bounding boxes Result: Trained model with Retriever Dictionary // Initialization of the Dictionary |
| Open Source Code | Yes | Code is released at https://github.com/henrytsui000/YOLO. |
| Open Datasets | Yes | We primarily validated the method on the Microsoft COCO dataset (Lin et al., 2014), training on the COCO 2017 train set and evaluating on the COCO 2017 validation set... For the Classification task, we used the CIFAR-100 dataset with the YOLOv9-classify model, using top-1 and top-5 accuracy as metrics... Using pre-trained weights from the MSCOCO dataset, we trained the model on the VOC (Everingham et al., 2010) dataset with three learning rate schedules: 10-epoch fast training, 100-epoch full-tuning, and training from scratch. |
| Dataset Splits | Yes | We primarily validated the method on the Microsoft COCO dataset (Lin et al., 2014), training on the COCO 2017 train set and evaluating on the COCO 2017 validation set... For Object Detection and Segmentation, we respectively used m AP and m AP@.5 as evaluation metrics... For the Classification task, we used the CIFAR-100 dataset... Latency was measured on a single Nvidia 3090 GPU without any external acceleration tools, using milliseconds per batch with a batch size of 32 on the MSCOCO validation set. |
| Hardware Specification | Yes | All experiments were conducted using 8 Nvidia V100 GPUs... The classification task on CIFAR-100 (Krizhevsky et al., 2009) took 2 hours on a single Nvidia 4090 GPU for 100 epochs. Latency was measured on a single Nvidia 3090 GPU without any external acceleration tools, using milliseconds per batch with a batch size of 32 on the MSCOCO validation set. |
| Software Dependencies | No | We primarily validated the method on the Microsoft COCO dataset (Lin et al., 2014), training on the COCO 2017 train set and evaluating on the COCO 2017 validation set. For Object Detection and Segmentation, we respectively used m AP and m AP@.5 as evaluation metrics, testing on YOLOv7, YOLOv9, Faster RCNN, and Deformable DETR. For the Classification task, we used the CIFAR-100 dataset with the YOLOv9-classify model, using top-1 and top-5 accuracy as metrics... We also trained a modified Faster RCNN, based on the mm-detection framework, for a maximum of 120 epochs over 3 days. |
| Experiment Setup | Yes | In the main series of experiments, we trained a modified YOLOv7 model, which included the addition of a Retriever-Dictionary Module, for 300 epochs in 2 days. The YOLOv9-based model was trained for 5 days with 500 epochs. We also trained a modified Faster RCNN, based on the mm-detection framework, for a maximum of 120 epochs over 3 days. For Deformable DETR, we trained for approximately 120 epochs over 7 days. The classification task on CIFAR-100 (Krizhevsky et al., 2009) took 2 hours on a single Nvidia 4090 GPU for 100 epochs. Latency was measured on a single Nvidia 3090 GPU without any external acceleration tools, using milliseconds per batch with a batch size of 32 on the MSCOCO validation set. |