reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Out-of-Distribution Generalization of Self-Supervised Learning

Authors: Wenwen Qiang, Jingyao Wang, Zeen Song, Jiangmeng Li, Changwen Zheng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first introduce the datasets used in experiments. Next, we evaluate our method1 on multiple tasks, including unsupervised learning, semi-supervised learning, transfer learning, and few-shot learning. We introduce the experimental setups in the corresponding sections. Finally, we perform ablation studies. All results reported are the averages of five runs performed on NVIDIA RTX 4090 GPUs.
Researcher Affiliation	Academia	1Institute of Software Chinese Academy of Sciences, Beijing, China 2University of the Chinese Academy of Sciences, Beijing, China. Correspondence to: Jiangmeng Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 Proposed Mini-Batch Sampling Strategy
Open Source Code	Yes	1Codes of the proposed Sampling Strategy can be found in https://github.com/ML-TASA/PID-SSL
Open Datasets	Yes	For unsupervised learning, we select Image Net-100 (Tian et al., 2020) and Image Net (Deng et al., 2009). For semisupervised learning, we select Image Net (Deng et al., 2009) for evaluation. For transfer learning, we select PASCAL VOC (Everingham et al., 2010) and COCO (Lin et al., 2014). For few-shot learning, we evaluate the proposed method on Omniglot (Lake et al., 2019), mini Image Net (Vinyals et al., 2016), and CIFAR-FS (Bertinetto et al., 2018).
Dataset Splits	Yes	In accordance with the standard protocol (Zbontar et al., 2021), we create two balanced subsets by sampling 1% and 10% of the training dataset. Specifically, we use the Image Net dataset, a large-scale benchmark for visual recognition tasks, comprising 1.2 million images in 1,000 categories. The subsets contain 1% and 10% of the labeled training data, which are used for fine-tuning the model.
Hardware Specification	Yes	All results reported are the averages of five runs performed on NVIDIA RTX 4090 GPUs.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup	Yes	Experimental setup. Our proposed sampling strategy is compatible with any D-SSL or G-SSL model. In the standard training procedure of SSL, a mini-batch is randomly sampled from the training data before each iteration. In contrast, our method replaces this random sampling step with a structured mini-batch construction process defined by Algorithm 1. Specifically, our approach integrates seamlessly into existing SSL frameworks by substituting the mini-batch sampling component with Algorithm 1, while leaving all other aspects of the SSL training pipeline unchanged. As a result, the overall training procedure and hyperparameter settings remain identical to those used in the baseline methods. Therefore, for all our experiments, we retain the original hyperparameter configurations to ensure a fair and consistent comparison.