Mini-Batch Optimization of Contrastive Loss
Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD, providing a better understanding of mini-batch optimization in contrastive learning. |
| Researcher Affiliation | Collaboration | Jaewoong Cho EMAIL KRAFTON Kartik Sreenivasan EMAIL Databricks Keon Lee EMAIL KRAFTON Kyunghoo Mun EMAIL Carnegie Mellon University Soheun Yi EMAIL Carnegie Mellon University Jeong-Gwan Lee EMAIL KRAFTON Anna Lee EMAIL KRAFTON Jy-yong Sohn EMAIL Yonsei University Dimitris Papailiopoulos EMAIL University of Wisconsin-Madison Kangwook Lee EMAIL University of Wisconsin-Madison KRAFTON |
| Pseudocode | Yes | Algorithm 1: Spectral Clustering Method Algorithm 2: The direct application of OSGD to our problem Algorithm 3: SGD with replacement Algorithm 4: SGD without replacement Algorithm 5: OSGD Algorithm 6: OSGD without replacement |
| Open Source Code | Yes | Details of the experimental setting can be found in Appendix E, and our code is available at https://github.com/ krafton-ai/mini-batch-cl. |
| Open Datasets | Yes | We validate our theoretical findings and the efficacy of the proposed method by providing experimental results on synthetic and real datasets. ... We also conduct experiments by pre-training on CIFAR-100 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015) using the proposed method. |
| Dataset Splits | No | We measure the performances of the models under the retrieval task which is defined as finding the positive pair image of a given image among all pairs (the number of images of the validation dataset): ... This process is iterated across 10K CIFAR-100 images (10K Tiny-Image Net images). The text describes the number of images used for the retrieval task (validation/test), but does not explicitly state the training, validation, and test splits (e.g., percentages or counts) for the model training phase. |
| Hardware Specification | Yes | All learning is executed on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The training code and hyperparameters are based on the official codebase of Sog CLR3 (Yuan et al., 2022). We use LARS optimizer (You et al., 2017) with the momentum of 0.9 and the weight decay of 10-6. The paper mentions specific optimizers and refers to a codebase, but does not provide specific version numbers for software libraries or environments like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use learning rate η = 0.5, and apply the normalization step at every iteration. We conduct mini-batch contrastive learning with the mini-batch size B = 32 using Res Net18-based encoders on CIFAR-100 and Tiny Image Net datasets. ... We use LARS optimizer (You et al., 2017) with the momentum of 0.9 and the weight decay of 10-6. We utilize the learning rate scheduler which starts with a warm-up phase in the initial 10 epochs, during which the learning rate increases linearly to the maximum value ηmax = 0.075 B. After this warm-up stage, we employ a cosine annealing (half-cycle) schedule for the remaining epochs. For the approximated OSGD, we employ k = 1500, q = 150. ... We train models for a total of 100 epochs. |