An Augmentation Overlap Theory of Contrastive Learning

Authors: Qi Zhang, Yifei Wang, Yisen Wang

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6.2, we conduct a series of experiments on a synthetic data set to verify the influence of the augmentation strength. Besides the synthetic data set, we also verify the practicality of our theoretical results on the real-world data set CIFAR-10. The experiments are conducted on Image Net with various data augmentations and various strengths
Researcher Affiliation Academia 1State Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2CSAIL, MIT 3Institute for Artificial Intelligence, Peking University
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code is available at https://github.com/PKU-ML/GARC.
Open Datasets Yes Figure 1: The t-SNE visualization of representations before and after contrastive learning method of Sim CLR on CIFAR-10 data set. The experiments are conducted on Image Net with various data augmentations and various strengths including 1) different types of augmentations with default parameters in Sim CLR, 2) Random Resized Crop with different strengths, 3) Colorjitter with different strengths. Besides CIFAR-10, we also evaluate the effectiveness of ARC on the large-scale data set Image Net.
Dataset Splits Yes We take 5000 samples as train set and 1000 samples as test set. The augmentation graph of CIFAR-10 with different strength r of Random Resized Crop. We choose a random subset of test images and randomly augment each one 20 times. The experiments are conducted on Image Net with various data augmentations and various strengths
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) were found in the paper.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers) were found in the paper.
Experiment Setup Yes For the encoder class F, a single-hidden-layer neural network with softmax activation and 256 output dimensions is adopted, which is trained by Info NCE loss. Our experiments are mainly conducted on the real-world data set CIFAR-10. We use Sim CLR as the training framework and Res Net18 as the backbone network. When calculating ARC, we set the number of intra-anchor augmented views C = 6 and k = 1. We train the network for 200 epochs and use the encoder trained with 200 epochs as the final encoder while the encoder trained with 1 epoch as the initial encoder.