On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning

Authors: Bokun Wang, Yunwen Lei, Yiming Ying, Tianbao Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm.
Researcher Affiliation Academia 1Texas A&M University 2University of Hong Kong 3University of Sydney
Pseudocode Yes Algorithm 1 NUCLR Algorithm for Self-Supervised Representation Learning
Open Source Code Yes Our code is available at https://github.com/bokun-wang/NUCLR.
Open Datasets Yes In our experiments, we apply our algorithm to bimodal self-supervised representation learning on the Conceptual Captions (CC3M) (Sharma et al., 2018) and Conceptual 12M (CC12M) (Changpinyo et al., 2021) datasets. ... Retrieval performance is evaluated on the test splits of the Flickr30k (Plummer et al., 2015) and MSCOCO (Lin et al., 2014) datasets, ... The top-1 classification accuracy is evaluated on the CIFAR100 (Krizhevsky et al., 2009), Image Net1k (Russakovsky et al., 2015), and Image Net-R (Hendrycks et al., 2021) datasets.
Dataset Splits Yes Retrieval performance is evaluated on the test splits of the Flickr30k (Plummer et al., 2015) and MSCOCO (Lin et al., 2014) datasets, in terms of the average Recall@1 score of image-to-text and text-to-image retrievals. The top-1 classification accuracy is evaluated on the CIFAR100 (Krizhevsky et al., 2009), Image Net1k (Russakovsky et al., 2015), and Image Net-R (Hendrycks et al., 2021) datasets.
Hardware Specification Yes All experiments utilize distributed data-parallel (DDP) training on two NVIDIA A100 GPUs with 40GB memory and the total batch size B in each iteration is 512.
Software Dependencies No The paper mentions "Adam W (Loshchilov and Hutter, 2017)" as the optimizer and "ResNet-50 as the vision encoder and Distil Bert as the text encoder". It also refers to adapting implementations from the "Open CLIP repository". However, it does not provide specific version numbers for any software libraries, programming languages, or environments.
Experiment Setup Yes All experiments utilize distributed data-parallel (DDP) training on two NVIDIA A100 GPUs with 40GB memory and the total batch size B in each iteration is 512. Besides, we use Res Net-50 as the vision encoder and Distil Bert as the text encoder. ... We run each algorithm 3 times with different random seeds and each run contains 30 epochs. Hyperparameters of all algorithms are tuned based on the validation performance. The optimizer for the model parameter w is Adam W (Loshchilov and Hutter, 2017) with a weight decay of 0.02 and a cosine learning rate schedule (Loshchilov and Hutter, 2016). For all algorithms, we choose a fixed temperature parameter τ tuned within {0.01,0.03,0.05,0.07}. For Sog CLR and NUCLR, we set γ = 0.8 as in the Sog CLR paper (Yuan et al., 2022). For our NUCLR, we select ζ0 = 0.05 on the CC3M dataset and ζ0 = 0 on the CC12M dataset. Besides, we freeze ζ in the first 5 epochs and update ζ by the SGDm optimizer with a cosine learning rate schedule.