reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary

Authors: Hao-Tang Tsui, Chien-Yao Wang, Hong-Yuan Liao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments show that using the RD significantly improves model performance, achieving more than a 3% increase in mean Average Precision for object detection with less than a 1% increase in model parameters. Beyond YOLO, the RD module improves the effectiveness of 2-stage models and DETR-based architectures, such as Faster R-CNN and Deformable DETR.
Researcher Affiliation	Academia	Hao-Tang Tsui, Chien-Yao Wang & Hong-Yuan Mark Liao Institute of Information Science, Academia Sinica EMAIL
Pseudocode	Yes	A.6 PSEUDO CODE OF FULL TRAINING PROCESS OF Retriever-Dictionary MODEL Algorithm 1: Train a model with Retriever Dictionary Data: Dataset with images and bounding boxes Result: Trained model with Retriever Dictionary // Initialization of the Dictionary
Open Source Code	Yes	Code is released at https://github.com/henrytsui000/YOLO.
Open Datasets	Yes	We primarily validated the method on the Microsoft COCO dataset (Lin et al., 2014), training on the COCO 2017 train set and evaluating on the COCO 2017 validation set... For the Classification task, we used the CIFAR-100 dataset with the YOLOv9-classify model, using top-1 and top-5 accuracy as metrics... Using pre-trained weights from the MSCOCO dataset, we trained the model on the VOC (Everingham et al., 2010) dataset with three learning rate schedules: 10-epoch fast training, 100-epoch full-tuning, and training from scratch.
Dataset Splits	Yes	We primarily validated the method on the Microsoft COCO dataset (Lin et al., 2014), training on the COCO 2017 train set and evaluating on the COCO 2017 validation set... For Object Detection and Segmentation, we respectively used m AP and m AP@.5 as evaluation metrics... For the Classification task, we used the CIFAR-100 dataset... Latency was measured on a single Nvidia 3090 GPU without any external acceleration tools, using milliseconds per batch with a batch size of 32 on the MSCOCO validation set.
Hardware Specification	Yes	All experiments were conducted using 8 Nvidia V100 GPUs... The classification task on CIFAR-100 (Krizhevsky et al., 2009) took 2 hours on a single Nvidia 4090 GPU for 100 epochs. Latency was measured on a single Nvidia 3090 GPU without any external acceleration tools, using milliseconds per batch with a batch size of 32 on the MSCOCO validation set.
Software Dependencies	No	We primarily validated the method on the Microsoft COCO dataset (Lin et al., 2014), training on the COCO 2017 train set and evaluating on the COCO 2017 validation set. For Object Detection and Segmentation, we respectively used m AP and m AP@.5 as evaluation metrics, testing on YOLOv7, YOLOv9, Faster RCNN, and Deformable DETR. For the Classification task, we used the CIFAR-100 dataset with the YOLOv9-classify model, using top-1 and top-5 accuracy as metrics... We also trained a modified Faster RCNN, based on the mm-detection framework, for a maximum of 120 epochs over 3 days.
Experiment Setup	Yes	In the main series of experiments, we trained a modified YOLOv7 model, which included the addition of a Retriever-Dictionary Module, for 300 epochs in 2 days. The YOLOv9-based model was trained for 5 days with 500 epochs. We also trained a modified Faster RCNN, based on the mm-detection framework, for a maximum of 120 epochs over 3 days. For Deformable DETR, we trained for approximately 120 epochs over 7 days. The classification task on CIFAR-100 (Krizhevsky et al., 2009) took 2 hours on a single Nvidia 4090 GPU for 100 epochs. Latency was measured on a single Nvidia 3090 GPU without any external acceleration tools, using milliseconds per batch with a batch size of 32 on the MSCOCO validation set.