reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection

Authors: Qibo Chen, Weizhong Jin, Jianyue Ge, Mengdi Liu, Yuchao Yan, Jian Jiang, Li Yu, Xuanjiang Guo, Shuchang Li, Jianzhong Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	CP-DETR demonstrates superior universal detection performance in a broad spectrum of scenarios. For example, our Swin-T backbone model achieves 47.6 zero-shot AP on LVIS, and the Swin-L backbone model achieves 32.2 zero-shot AP on ODin W35. Furthermore, our visual prompt generation method achieves 68.4 AP on COCO val by interactive detection, and the optimized prompt achieves 73.1 fully-shot AP on ODin W13.
Researcher Affiliation	Industry	China Mobile(Zhejiang) Research & Innovation Institute EMAIL
Pseudocode	No	The paper describes methods through architectural diagrams (Figure 1, Figure 2) and mathematical equations, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	For the object level, we use publicly available detection datasets, which contain Objects365 (Shao et al. 2019) (O365), Open Images(Kuznetsova et al. 2020) (OI), V3Det (Wang et al. 2023), LVIS (Gupta, Dollar, and Girshick 2019) and COCO (Lin et al. 2014) datasets. For grounding or REC data, we used the Gold G (Kamath et al. 2021), Ref COCO/+/g (Yu et al. 2016; Mao et al. 2016), Visual Genome (Krishna et al. 2017) (VG) and Phrase Cut (Wu et al. 2020) datasets
Dataset Splits	No	The paper lists various datasets used for training and evaluation (e.g., O365, V3Det, Gold G, COCO, LVIS, ODin W). It refers to evaluation benchmarks (COCO, LVIS, ODin W, Ref C), implying standard test splits for these, but it does not explicitly provide details on how the combined training data or any specific dataset was split into training, validation, and test sets for their experiments, nor does it specify exact percentages or sample counts for these splits.
Hardware Specification	Yes	In all experiments, we use Adam W as the optimizer with weight decay set to 1e-4 and set a minibatch to 32 on 8 A100 40GB GPUs.
Software Dependencies	No	The paper mentions using specific models like "CLIP-L" and "Swin-Tiny and Swin Large" and an optimizer "Adam W", but it does not specify version numbers for core software components such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA libraries.
Experiment Setup	Yes	In all experiments, we use Adam W as the optimizer with weight decay set to 1e-4 and set a minibatch to 32 on 8 A100 40GB GPUs. In pre-training, the learning rate was set to 1e-5 for the text encoder and image backbone and 1e-4 for the rest of the modules, and a decay of 0.1 was applied at 80% and 90% of the total training steps. In visual prompt training, the O365, V3Det, Gold G, and OI datasets are used, the learning rate of the visual prompt encoder is set to 1e-4, and the training is performed for 0.5M iterations. In the optimized prompt, the learning rate of the embedding layer is set to 5e-2, the total number of training epochs is 24, and a decay of 0.1 is applied at 80% of the total training steps.