Open-Det: An Efficient Learning Framework for Open-Ended Detection

Authors: Guiping Cao, Tao Wang, Wenjian Huang, Xiangyuan Lan, Jianguo Zhang, Dongmei Jiang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments, 4.1. Main Results, 4.2. Ablation Studies, 4.3. Improvements in VL Alignment Scores
Researcher Affiliation Academia 1Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China. 2Pengcheng Laboratory, Shenzhen, China. 3Pazhou Laboratory (Huangpu). 4Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China. Correspondence to: Xiangyuan Lan <EMAIL>, Jianguo Zhang <EMAIL>.
Pseudocode No The paper describes methods in prose and mathematical formulations and includes architectural diagrams (Figure 2, 3, 5), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source codes are available at: https://github. com/Med-Process/Open-Det.
Open Datasets Yes Datasets. We train our model with a small set of detection data Visual Genome (VG) (Krishna et al., 2017)... our model is evaluated on the commonly used zero-shot LVIS (Gupta et al., 2019) dataset... The COCO2017 (Lin et al., 2014) and Objects365 (Shao et al., 2019) are also used for performance evaluation.
Dataset Splits Yes We train our model with a small set of detection data Visual Genome (VG) (Krishna et al., 2017), which contains 77,398 images for training... evaluating on the 5k Mini Val subset of the LVIS (Gupta et al., 2019) dataset.
Hardware Specification Yes fewer GPU resources (4 V100 vs. 16 A100), our models are trained with a mini-batch size of 8 on 4 Tesla V100 GPUs, Swin-Small (using 4 V100 GPUs) and Swin-Large (using 4 A800 GPUs)
Software Dependencies No The paper mentions software components like 'Flan T5-base' and 'Adam W optimizer' but does not provide specific version numbers for these or other ancillary software libraries or programming environments.
Experiment Setup Yes our models are trained with a mini-batch size of 8 on 4 Tesla V100 GPUs, using the Adam W optimizer (Loshchilov, 2017) with a weight decay of 0.05. The learning rates are configured as follows: 1 10 4 for both the Object Detector and Prompts Distiller, and 2 10 4 for the Object Name Generator. threshold τ (default: 0.99), α (default: 0.25)