reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mini-Batch Optimization of Contrastive Loss

Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD, providing a better understanding of mini-batch optimization in contrastive learning.
Researcher Affiliation	Collaboration	Jaewoong Cho EMAIL KRAFTON Kartik Sreenivasan EMAIL Databricks Keon Lee EMAIL KRAFTON Kyunghoo Mun EMAIL Carnegie Mellon University Soheun Yi EMAIL Carnegie Mellon University Jeong-Gwan Lee EMAIL KRAFTON Anna Lee EMAIL KRAFTON Jy-yong Sohn EMAIL Yonsei University Dimitris Papailiopoulos EMAIL University of Wisconsin-Madison Kangwook Lee EMAIL University of Wisconsin-Madison KRAFTON
Pseudocode	Yes	Algorithm 1: Spectral Clustering Method Algorithm 2: The direct application of OSGD to our problem Algorithm 3: SGD with replacement Algorithm 4: SGD without replacement Algorithm 5: OSGD Algorithm 6: OSGD without replacement
Open Source Code	Yes	Details of the experimental setting can be found in Appendix E, and our code is available at https://github.com/ krafton-ai/mini-batch-cl.
Open Datasets	Yes	We validate our theoretical findings and the efficacy of the proposed method by providing experimental results on synthetic and real datasets. ... We also conduct experiments by pre-training on CIFAR-100 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015) using the proposed method.
Dataset Splits	No	We measure the performances of the models under the retrieval task which is defined as finding the positive pair image of a given image among all pairs (the number of images of the validation dataset): ... This process is iterated across 10K CIFAR-100 images (10K Tiny-Image Net images). The text describes the number of images used for the retrieval task (validation/test), but does not explicitly state the training, validation, and test splits (e.g., percentages or counts) for the model training phase.
Hardware Specification	Yes	All learning is executed on a single NVIDIA A100 GPU.
Software Dependencies	No	The training code and hyperparameters are based on the official codebase of Sog CLR3 (Yuan et al., 2022). We use LARS optimizer (You et al., 2017) with the momentum of 0.9 and the weight decay of 10-6. The paper mentions specific optimizers and refers to a codebase, but does not provide specific version numbers for software libraries or environments like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We use learning rate η = 0.5, and apply the normalization step at every iteration. We conduct mini-batch contrastive learning with the mini-batch size B = 32 using Res Net18-based encoders on CIFAR-100 and Tiny Image Net datasets. ... We use LARS optimizer (You et al., 2017) with the momentum of 0.9 and the weight decay of 10-6. We utilize the learning rate scheduler which starts with a warm-up phase in the initial 10 epochs, during which the learning rate increases linearly to the maximum value ηmax = 0.075 B. After this warm-up stage, we employ a cosine annealing (half-cycle) schedule for the remaining epochs. For the approximated OSGD, we employ k = 1500, q = 150. ... We train models for a total of 100 epochs.