Contrastive Attraction and Contrastive Repulsion for Representation Learning
Authors: Huangjie Zheng, Xu Chen, Jiangchao Yao, Hongxia Yang, Chunyuan Li, Ya Zhang, Hao Zhang, Ivor Tsang, Jingren Zhou, Mingyuan Zhou
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With our extensive experiments, CACR not only demonstrates good performance on CL benchmarks, but also shows better robustness when generalized on imbalanced image datasets. Code and pre-trained checkpoints are available at https://github.com/Jeg Zheng/CACR-SSL. Our theoretical analysis reveals that CACR generalizes CL s behavior by positive attraction and negative repulsion, and it further considers the intra-contrastive relation within the positive and negative pairs to narrow the gap between the sampled and true distribution, which is important when datasets are less curated. Our experiments demonstrate the effectiveness of CACR in a variety of standard CL settings, with both convolutional and transformer-based architectures on various benchmark datasets. |
| Researcher Affiliation | Collaboration | Huangjie Zheng EMAIL Department of Statistics and Data Science The University of Texas at Austin; Xu Chen EMAIL Shanghai Jiao Tong University Alibaba Group; Jiangchao Yao EMAIL Cooperative Medianet Innovation Center, Shanghai Jiao Tong University Shanghai AI Laboratory; Hongxia Yang EMAIL Shanghai Institute for Advanced Study of Zhejiang University (SIAS); Chunyuan Li EMAIL Microsoft Research, Redmond; Ya Zhang EMAIL Cooperative Medianet Innovation Center, Shanghai Jiao Tong University Shanghai AI Laboratory; Hao Zhang EMAIL Xidian University; Ivor Tsang EMAIL A STAR Centre for Frontier AI Research (CFAR); Jingren Zhou EMAIL Alibaba Group; Mingyuan Zhou EMAIL Mc Combs School of Business The University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1 Py Torch-like Augmentation Code on CIFAR-10, CIFAR-100 and STL-10; Algorithm 2 Py Torch-like Augmentation Code on Image Net-100 and Image Net-1K; Algorithm 3 Py Torch-like style pseudo-code of CACR with Mo Co-v2 at each iteration. |
| Open Source Code | Yes | Code and pre-trained checkpoints are available at https://github.com/Jeg Zheng/CACR-SSL. |
| Open Datasets | Yes | In this section, we first study the CACR behaviors with small-scale experiments, where we use CIFAR-10, CIFAR-100 (Hinton, 2007) and create two class-imbalanced CIFAR datasets as empirical verification of our theoretical analysis. For large-scale datasets, we use Image Net-1K (Deng et al., 2009) and compare with the state-of-the-art frameworks (He et al., 2020; Zbontar et al., 2021; Chen et al., 2020a; Caron et al., 2020; Grill et al., 2020; Huynh et al., 2020) on linear probing, where we report the Top-1 validation accuracy on Image Net-1K data. To further justify our analysis, we also leverage two large-scale but label-imbalanced datasets (Webvision v1 and Image Net-22K) for linear probing pretraining. |
| Dataset Splits | Yes | For evaluation we keep the standard validation/testing datasets. The linear classifier is trained on Image Net-1K on top of fixed representations of the pretrained Res Net50 encoder. The model is tuned on the train2017 set and evaluate on val2017 set |
| Hardware Specification | Yes | On small-scale datasets, all experiments are conducted on a single GPU, including NVIDIA 1080 Ti and RTX 3090; on large-scale datasets, all experiments are done on 8 Tesla-V100-32G GPUs. Table 14: GPU time (s) per iteration of CACR w.r.t. different K on CIFAR-10 with Alex Net framework (mini-batch size is 128), tested on Tesla-v100 GPU. Table 17: GPU time (s) per iteration of different loss on Mo Cov2 framework, tested on 32G-V100 GPU |
| Software Dependencies | No | The paper includes 'PyTorch-like' pseudocode in Algorithms 1, 2, and 3, and mentions 'detectron2 (Wu et al., 2019)' for object detection and segmentation. However, no specific version numbers are provided for PyTorch, torchvision, or detectron2. |
| Experiment Setup | Yes | We apply the mini-batch SGD with 0.9 momentum and 1e-4 weight decay. The learning rate is linearly scaled as 0.12 per 256 batch size (Goyal et al., 2017). The optimization is done over 200 epochs, and the learning rate is decayed by a factor of 0.1 at epoch 155, 170, and 185. Specifically, the temperature parameter of CL is τ = 0.19, the hyper-parameters of AU-CL are t = 2.0, τ = 0.19, and the hyper-parameter of HN-CL are τ = 0.5, β = 1.0 , which shows the best performance according to our tuning. For CACR, in both single and multi-positive sample settings, we set t+ = 1.0 for all small-scale datasets. As for t , for CACR (K = 1), t is 2.0, 3.0, and 3.0 on CIFAR-10,CIFAR100, and STL-10, respectively. For CACR (K = 4), t is 0.9, 2.0, and 2.0 on CIFAR-10, CIFAR100, and STL-10, respectively. |