reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Authors: Vlad Sobal, Mark Ibrahim, Randall Balestriero, Vivien Cabannes, Diane Bouchacourt, Pietro Astolfi, Kyunghyun Cho, Yann LeCun

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study spans three scales: Image Net-1k with 1 million, CC3M with 3 million, and CC12M with 12 million samples. The representations learned via our objective outperform both contrastive self-supervised and visionlanguage models trained on the same data across a range of tasks.
Researcher Affiliation	Collaboration	1Meta FAIR 2New York Univeristy 3Brown University 4Genentech 5CIFAR
Pseudocode	Yes	Figure 1: a) The diagram of X-CLR. X-CLR objective learns representations of images with the help of a soft relationship graph. The graph can be built based on accompanying data, e.g. taxonomy for biological data. In our experiments, we use captioned images, and build similarities based on caption similarities. b) Python-style pseudo-code of X-CLR with similarity based on text captions.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We test X-CLR on three datasets of varying scale: Image Net (Deng et al., 2009) (1M), and Conceptual Captions 3M and 12M (Sharma et al., 2018). ... We test on Image Net classification with standard as well as with Image Net Real labels, on Image Net-9 to test robustness to background change (we refer to this as Background Decomposition in our results), on Object Net to test robustness to context and view change, and on MIT-States objects and attributes classification to test how well model captures object states.
Dataset Splits	Yes	MIT States In order to evaluate on this dataset using linear probing, we split the dataset randomly into two even parts, one used for training the linear layer, the other for evaluation. ... Image Net-9 (Xiao et al., 2020) proposes multiple benchmarks to test model robustness to the background perturbation. The benchmark is created by taking samples from Image Net, segmenting the object in the scene, and swapping out the background. Since the benchmark uses the same classes as Image Net, we do not retrain the Image Net classifier.
Hardware Specification	Yes	To train on Image Net, we used 8 Nvidia V100s, and each run took about 30 hours.
Software Dependencies	No	We use the Sentence Transformer (Reimers and Gurevych, 2019) as the text encoder to construct similarities unless stated otherwise. ... We used the NTLK library Bird et al. (2009), and used the Wu-Palmer similarity (Wu and Palmer, 1994) between the class synsets; ... The paper mentions software components but does not provide specific version numbers for them.
Experiment Setup	Yes	All experiments on the Image Net dataset were run for 100 epochs with 1024 batch size. The learning rate was set to 0.075 for Image Net models. For experiments on CC3M and CC12M, we used the standard Sim CLR augmentations, and a learning rate of 0.1. ... We train Sim CLR, Sup Con and X-CLR using the LARS optimizer (You et al., 2017). ... The output dimension of the projector is 128.