reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning

Authors: Shaofeng Zhang, Qiang Zhou, Sitong Wu, Haoru Tan, zhibin wang, Jinfa Huang, Junchi Yan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments on standard datasets demonstrate state-of-the-art (SOTA) results. Compared to the previous SOTA method (PQCL), our approach achieves significant improvements on COCO: with 300 epochs of pretraining, CR2PQ obtains 3.4% m APbb and 2.1% m APmk improvements for detection and segmentation tasks, respectively. Furthermore, CR2PQ exhibits faster convergence, achieving 10.4% m APbb and 7.9% m APmk improvements over SOTA with just 40 epochs of pretraining.
Researcher Affiliation	Collaboration	1Sch. of Computer Science & Sch. of Aitificial Intelligence, Shanghai Jiao Tong University 2INF Tech Co., Ltd., 3CUHK, 4HKU, 5Peking University. EMAIL
Pseudocode	Yes	Code 1 shows the implementation of how to compute relative coordinates matrix of rp B. Listing 1: Computing Relative Coordinates
Open Source Code	Yes	Code: https://github.com/Sherrylone/PQRoPE
Open Datasets	Yes	We conduct self-supervised pre-training on the Image Net-1K (Deng et al., 2009) training set with 1,000 classes, as used in SSL for both MIM (He et al., 2021) and contrastive learning (Chen et al., 2020a). We also transfer the encoder pre-trained by CR2PQ on MS-COCO (Lin et al., 2014) and ADE20K (Zhou et al., 2017) datasets.
Dataset Splits	Yes	MS COCO (Lin et al., 2014) is a large-scale object detection, segmentation, and captioning dataset: in particular, train 2017 and val 2017 splits contain 118K and 5K images, respectively. ... ADE20K (Zhou et al., 2017), which contains 150 fine-grained semantic categories and 25K training data.
Hardware Specification	Yes	The experiments are performed on a workstation with 32 V100 GPUs by default (if not otherwise specified). ... Specifically, we pre-train the Vi T-Large with 800 epochs with batch size 2048, distributed on 16 A100 GPUs with the base learning rate 1.5e-4.
Software Dependencies	No	We follow the basic configuration of mmdetection (Chen et al., 2019) for fine-tuning Mask R-CNN (He et al., 2017) with FPN (Lin et al., 2017) under the standard 1x schedule. ... We follow all the configurations of mmsegmentation (Contributors, 2020) for fine-tuning Semantic FPN (Lin et al., 2017) with 40K iterations and an input resolution of 512 × 512. The paper mentions software tools like 'mmdetection' and 'mmsegmentation' along with their corresponding citations, but it does not specify exact version numbers for these or any other software components used.
Experiment Setup	Yes	In line with CAE (Chen et al., 2022), we train with Adamw (Loshchilov & Hutter, 2018) and a batch size of 2048, distributed over 32 GPUs using Vi T-S/16 (batch size per GPU is 64). For Vi T-B, the learning rate is linearly ramped up during the first 40 epochs to its base value determined with the following linear scaling rule (Chen et al., 2020a): blr = 1.5e-4, Batch Size=2048, and lr = blr × Batch Size/256. For Vi T-S, we set blr as 1.75e-4. After warmup, we decay the learning rate with a cosine schedule (Loshchilov & Hutter, 2016).