reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Beyond Instance Consistency: Investigating View Diversity in Self-supervised Learning

Authors: Huaiyuan Qin, Muli Yang, Siyuan Hu, Peng Hu, Yu Zhang, Chen Gong, Hongyuan Zhu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive ablation studies, we demonstrate that SSL can still learn meaningful representations even when positive pairs lack strict instance consistency. Furthermore, our analysis further reveals that increasing view diversity, by enforcing zero overlapping or using smaller crop scales, can enhance downstream performance on classiﬁcation and dense prediction tasks. We validate our ﬁndings across a range of settings, highlighting their robustness and applicability on diverse data sources.
Researcher Affiliation	Academia	Huaiyuan Qin EMAIL Institute for Infocomm Research (I2R), ASTAR, Singapore Muli Yang EMAIL Institute for Infocomm Research (I2R), ASTAR, Singapore Siyuan Hu EMAIL National University of Singapore, Singapore Peng Hu EMAIL Sichuan University, China Yu Zhang EMAIL Southeast University, China Chen Gong EMAIL Shanghai Jiaotong University, China Hongyuan Zhu EMAIL Institute for Infocomm Research (I2R), A*STAR, Singapore
Pseudocode	Yes	Algorithm A: Procedure for EMD-based Similarity Score
Open Source Code	No	The paper does not provide an explicit statement about the release of their source code or a link to a code repository for the methodology described in this paper. It mentions using existing toolboxes like MMDetection, MMRotate, Monocular-Depth-Estimation-Toolbox, and MMSegmentation for evaluation, but not their own implementation code.
Open Datasets	Yes	We conduct SSL pre-training on two datasets: COCO for non-iconic data and Image Net-100 for object-centric data. COCO (Lin et al., 2014) is a large non-iconic dataset... Image Net-100 is a subset of the object-centric dataset Image Net-1K (Deng et al., 2009)... We evaluate the pre-trained models on a board range of downstream evaluation tasks including classiﬁcation, object detection, instance segmentation and depth prediction. For object detection, we use PASCAL VOC-0712 (Everingham et al., 2010) for general object detection, and DOTA-v1.0 (Xia et al., 2018) for aerial object detection. For classiﬁcation, we utilize ﬁve small-scale classiﬁcation datasets: CIFAR10 (Krizhevsky et al., 2009a), CIFAR-100 (Krizhevsky et al., 2009b), DTD (Cimpoi et al., 2014), Oxford Pets (Parkhi et al., 2012), and STL-10 (Coates et al., 2011). Additionally, COCO (Lin et al., 2014) is included for the in-distribution evaluation on object detection and instance segmentation tasks. We also include depth prediction on NYUd (Silberman et al., 2012)... We provide additional validation results on medical imaging domain... NIH Chest X-ray dataset (Wang et al., 2017).
Dataset Splits	Yes	For object detection, we use PASCAL VOC-0712 (Everingham et al., 2010) for general object detection, and DOTA-v1.0 (Xia et al., 2018) for aerial object detection... For Mo Co-v2, we follow the evaluation protocol in Peng et al. (2022)... For DINO, we follow Caron et al. (2021)... To maintain consistency with standard classiﬁcation evaluation, we ﬁlter out samples with multiple or missing labels, and report the Top-1 classiﬁcation accuracy.
Hardware Specification	Yes	All pre-training experiments are conducted on NVIDIA RTX A6000 GPUs. All downstream experiments are conducted on NVIDIA RTX A6000 GPUs. Time are measured with the batch size of 256 on 8 NVIDIA RTX A6000 GPUs and AMD EPYC 7543 32-Core CPU, with EMD solver from Open CV.
Software Dependencies	No	The paper mentions several software components like Py Torch, MMDetection, MMRotate, Monocular-Depth-Estimation-Toolbox, MMSegmentation, and OpenCV, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	All models are pre-trained from scratch for 100 epochs. For the backbone, we use Res Net-50 (He et al., 2016) in Mo Co-v2 (Chen et al., 2020b), and Vi T-S (Dosovitskiy et al., 2021) with the patch size of 16 in DINO (Caron et al., 2021). Speciﬁcally, for Mo Co-v2, we set the batch size as 256 and the learning rate as 0.3 with the SGD optimizer. For DINO, we set the batch size as 256 and the learning rate as 0.0005 with the Adam W optimizer. All other training hyper-parameters follow the original settings in their respective implementations.