Contrastive Learning with Consistent Representations
Authors: Zihu Wang, Yu Wang, Zhuotong Chen, Hanbin Hu, Peng Li
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Co Cor notably enhances the generalizability and transferability of learned representations in comparison to baseline methods. The classification accuracies for Image Net-100 and Image Net-1K pre-trained encoders are presented in Table 1 and Table 2. Object detection: We fine-tune the pre-trained encoders on VOC2007+2012 Everingham et al. (2009) and COCO2017 Lin et al. (2014) datasets for object detection downstream tasks. |
| Researcher Affiliation | Academia | Zihu Wang EMAIL Department of Electrical and Computer Engineering University of California, Santa Barbara Yu Wang EMAIL Department of Electrical and Computer Engineering University of California, Santa Barbara Zhuotong Chen EMAIL Department of Electrical and Computer Engineering University of California, Santa Barbara Hanbin Hu EMAIL Department of Electrical and Computer Engineering University of California, Santa Barbara Peng Li EMAIL Department of Electrical and Computer Engineering University of California, Santa Barbara |
| Pseudocode | Yes | Algorithm 1: Algorithm flow of Co Cor Input: initial encoder, MMNN, and classifier parameters θ(0) e , θ(0) d , and θ(0) c , number of training epochs N, unlabeled dataloader Du, labeled dataloader Dl. for i=1 to N do 1.Sample unlabeled xu and labeled data (xl, yl) from Du and Dl, respectively. 2.Call equation 5 to update the encoder s parameter θ(i) e on xu with MMNN parameters θ(i 1) d fixed. 3.Call equation 4 to update MMNN θ(i) d and classifier θ(i) c on (xl, yl) with the encoder parameters θ(i) e fixed. Output: Trained encoder with parameters θN e |
| Open Source Code | Yes | The implementation of Co Cor can be found at https://github.com/zihuwang97/Co Cor. |
| Open Datasets | Yes | The encoders of the baseline methods are pre-trained on the large Image Net-1K Russakovsky et al. (2015) dataset and its subset Image Net-100 Tian et al. (2020a), under two different backbone encoder architectures Res Net-50 and Res Net-34 He et al. (2016). Linear evaluation is conducted on the following datasets: Cifar-10/100 Krizhevsky et al. (2009), CUB-200 Wah et al. (2011), Caltech-101 Fei-Fei et al. (2004), SUN397 Xiao et al. (2010), Food101 Bossard et al. (2014), Flowers102 Nilsback & Zisserman (2008), Oxford-IIIT Pet (Pets) Parkhi et al. (2012), Aircraft Maji et al. (2013), and Stanford Cars Krause et al. (2013). We fine-tune the pre-trained encoders on VOC2007+2012 Everingham et al. (2009) and COCO2017 Lin et al. (2014) datasets for object detection downstream tasks. |
| Dataset Splits | Yes | Linear evaluation Each pre-trained encoder is parameter-frozen and paired with a linear classifier, which is fine-tuned, following the linear evaluation protocol of Krizhevsky et al. (2017), as detailed in Appendices. Linear evaluation is conducted on the following datasets: Cifar-10/100 Krizhevsky et al. (2009), CUB-200 Wah et al. (2011), Caltech-101 Fei-Fei et al. (2004), SUN397 Xiao et al. (2010), Food101 Bossard et al. (2014), Flowers102 Nilsback & Zisserman (2008), Oxford-IIIT Pet (Pets) Parkhi et al. (2012), Aircraft Maji et al. (2013), and Stanford Cars Krause et al. (2013). We follow the linear evaluation protocol of Chen et al. (2020a); Lee et al. (2021); Kornblith et al. (2019). A one-layer linear classifier is trained upon the frozen pre-trained backbone. Resize, Centor Crop, and Normalization are used as the training transformations. An L-BFGS optimizer is adopted for minimizing the ℓ2-regularized cross-entropy loss. The optimal regularization parameter is selected on the validation set of each dataset. |
| Hardware Specification | Yes | Training Time Comparison on a machine with 4 A100 GPUs. Training time (in hour)/Top-1 linear evaluation results (in %) on Image Net-1K is provided for comparison. |
| Software Dependencies | No | The paper mentions using 'Detectron2 Wu et al. (2019)' but does not specify a version number or other software dependencies with their versions. For example, it does not mention Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | The encoders of Mo Co and Sim Siam based methods are pretrained for 200 epochs, while Sup Con s encoder undergoes 100 epochs of pre-training. Batch size of all pre-training experiments is set to 256. We follow Khosla et al. (2020); Chen et al. (2020b); Chen & He (2021) for other settings of these baseline methods. Mo Co He et al. (2020); Chen et al. (2020b): We use a starting learning rate of 3e-2, a weight decay of 1e-4, and a momentum of 0.9 for the optimizer. For the dual encoders of Mo Co, feature space dimension is 128, memory queue size is 65536, and the momentum of updating the key encoder is 0.999. |