On the Similarities of Embeddings in Contrastive Learning

Authors: Chungpa Lee, Sehee Lim, Kibok Lee, Jy-Yong Sohn

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that incorporating the proposed loss improves performance in small-batch settings. In this section, we empirically validate the impact of our theoretical results discussed in Sec. 5, especially for the practical scenarios of mini-batch settings. First, we empirically observe that the excessive separation of negative pairs (proven in Theorem 5.5) actually happens in experiments on benchmark datasets. Second, we empirically confirm that such excessive separation issue can be mitigated by using the proposed loss in Def. 5.7 which reduces the variance of the negative-pair similarities. Third, we observe such variance reduction improves the quality of learned representations in various real-world experiments.
Researcher Affiliation Academia Department of Statistics and Data Science, Yonsei University, Seoul, Korea. Correspondence to: Jy-yong Sohn <EMAIL>
Pseudocode No The paper defines contrastive loss formulations (Definition 3.1 and 3.2) and a variance reduction auxiliary loss (Definition 5.7), but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is based on the open-source library solo-learn (da Costa et al., 2022) for self-supervised learning. The source code is available at https://github.com/leechungpa/embedding-similarity-cl/.
Open Datasets Yes The models are pretrained on CIFAR-100 (Krizhevsky et al., 2009)... We pretrain models on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009) using various contrastive losses...
Dataset Splits Yes We pretrain models on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009) using various contrastive losses... For the linear evaluation protocol, we remove the projector head and using the pretrained encoder for downstream classification tasks... we report top 1 accuracy on the downstream dataset.
Hardware Specification Yes All experiments were conducted using a single NVIDIA RTX 4090 GPU.
Software Dependencies No Our implementation is based on the open-source library solo-learn (da Costa et al., 2022) for self-supervised learning. The paper mentions this library but does not provide specific version numbers for it or any other key software dependencies (e.g., Python, PyTorch).
Experiment Setup Yes For all experiments, we use modified Res Net-18... and Res Net-50 for Image Net-100... we attach a 2-layer MLP as the projection head... For the CIFAR datasets, the crop size is set to 32, while for Image Net-100, we use a crop size of 224... we use stochastic gradient descent (SGD) for 200 epochs. The learning rate is scaled linearly with the batch size as lr Batch Size/256, where the base learning rate is set to 0.3 for the CIFAR datasets and 0.1 for Image Net-100. A cosine decay schedule is applied, with a weight decay of 0.0001 and SGD momentum set to 0.9. Additionally, we use linear warmup for the first 10 epochs. We tune the temperature parameter for baseline methods... by performing a grid search over the range of 0.1 to 0.5 in increments of 0.1... For tuning the proposed loss LVRNS(U, V) in Def. 5.7, we conducted a grid search for λ from the set {0.1, 0.3, 1, 3, 10, 30, 100}.