Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection

Authors: Chuhan ZHANG, Chaoyang Zhu, Pingcheng Dong, Long Chen, Dong Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experimental results demonstrate that our method achieves performance gain of +2.9% and +10.2% AP50 over previous state-of-the-arts on the challenging COCO benchmark, both without and with a stronger teacher model. The code is provided at https://github.com/ZCHUHan/CCKT-Det. ... We validate the effectiveness of the proposed CCKT-Det across the COCO (Lin et al., 2014), LVIS (Gupta et al., 2019), and Objects365 (Shao et al., 2019) transfer benchmarks. ... 4 EXPERIMENTS Datasets and evaluation metrics. Following the OV-COCO benchmark (Zareian et al., 2021), we train the our model on 48 base categories and test on both the 48 base classes (CB) and 17 novel categories (CN). ... 4.2 ABLATION ANALYSIS We conduct ablation studies on OV-COCO (Zareian et al., 2021) to evaluate the effectiveness of each proposed component, hyperparameters, and scalability.
Researcher Affiliation Academia Chuhan Zhang1,2, Chaoyang Zhu1, Pingcheng Dong1, Long Chen1, Dong Zhang1,2 1The Hong Kong University of Science and Technology 2AI Chip Center for Emerging Smart Systems (ACCESS) EMAIL;EMAIL EMAIL
Pseudocode No The paper describes the methodology in prose and figures (Figure 1, Figure 2, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is provided at https://github.com/ZCHUHan/CCKT-Det.
Open Datasets Yes Comprehensive experimental results demonstrate that our method achieves performance gain of +2.9% and +10.2% AP50 over previous state-of-the-arts on the challenging COCO benchmark... We validate the effectiveness of the proposed CCKT-Det across the COCO (Lin et al., 2014), LVIS (Gupta et al., 2019), and Objects365 (Shao et al., 2019) transfer benchmarks.
Dataset Splits Yes Following the OV-COCO benchmark (Zareian et al., 2021), we train the our model on 48 base categories and test on both the 48 base classes (CB) and 17 novel categories (CN). We also conduct experiments on LVIS (Gupta et al., 2019), where 866 common and 405 frequent classes are treated as base categories, and 337 rare classes are considered novel categories.
Hardware Specification Yes A batch size of 2 × 6 (RTX-3090 GPUs) is employed, with Adam W optimizer.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes A batch size of 2 × 6 (RTX-3090 GPUs) is employed, with Adam W optimizer. Gradient clipping is adopted with a maximum norm 0.1. Following (Carion et al., 2020; Zareian et al., 2021), τ is set to 0.07, and the loss weights are set to λg Io U = 2.0, λL1 = 5.0, λcls = 2.0, and λcontrast = 1.0. The number of template prompts is set to 12 following (Cho et al., 2024). ... The model is trained with semantic guidance in Sec. 3.2 for the first 30 epochs with a base learning rate 10−4. Subsequently, training continues with the incorporation of regional contrastive distillation loss, accompanied by a learning rate decay factor of 0.1 post the 31st epoch.