reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Augmentation Overlap Theory of Contrastive Learning

Authors: Qi Zhang, Yifei Wang, Yisen Wang

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6.2, we conduct a series of experiments on a synthetic data set to verify the inﬂuence of the augmentation strength. Besides the synthetic data set, we also verify the practicality of our theoretical results on the real-world data set CIFAR-10. The experiments are conducted on Image Net with various data augmentations and various strengths
Researcher Affiliation	Academia	1State Key Lab of General Artiﬁcial Intelligence, School of Intelligence Science and Technology, Peking University 2CSAIL, MIT 3Institute for Artiﬁcial Intelligence, Peking University
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code is available at https://github.com/PKU-ML/GARC.
Open Datasets	Yes	Figure 1: The t-SNE visualization of representations before and after contrastive learning method of Sim CLR on CIFAR-10 data set. The experiments are conducted on Image Net with various data augmentations and various strengths including 1) diﬀerent types of augmentations with default parameters in Sim CLR, 2) Random Resized Crop with diﬀerent strengths, 3) Colorjitter with diﬀerent strengths. Besides CIFAR-10, we also evaluate the eﬀectiveness of ARC on the large-scale data set Image Net.
Dataset Splits	Yes	We take 5000 samples as train set and 1000 samples as test set. The augmentation graph of CIFAR-10 with diﬀerent strength r of Random Resized Crop. We choose a random subset of test images and randomly augment each one 20 times. The experiments are conducted on Image Net with various data augmentations and various strengths
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) were found in the paper.
Software Dependencies	No	No specific ancillary software details (e.g., library or solver names with version numbers) were found in the paper.
Experiment Setup	Yes	For the encoder class F, a single-hidden-layer neural network with softmax activation and 256 output dimensions is adopted, which is trained by Info NCE loss. Our experiments are mainly conducted on the real-world data set CIFAR-10. We use Sim CLR as the training framework and Res Net18 as the backbone network. When calculating ARC, we set the number of intra-anchor augmented views C = 6 and k = 1. We train the network for 200 epochs and use the encoder trained with 200 epochs as the ﬁnal encoder while the encoder trained with 1 epoch as the initial encoder.