On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
Authors: Bokun Wang, Yunwen Lei, Yiming Ying, Tianbao Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm. |
| Researcher Affiliation | Academia | 1Texas A&M University 2University of Hong Kong 3University of Sydney |
| Pseudocode | Yes | Algorithm 1 NUCLR Algorithm for Self-Supervised Representation Learning |
| Open Source Code | Yes | Our code is available at https://github.com/bokun-wang/NUCLR. |
| Open Datasets | Yes | In our experiments, we apply our algorithm to bimodal self-supervised representation learning on the Conceptual Captions (CC3M) (Sharma et al., 2018) and Conceptual 12M (CC12M) (Changpinyo et al., 2021) datasets. ... Retrieval performance is evaluated on the test splits of the Flickr30k (Plummer et al., 2015) and MSCOCO (Lin et al., 2014) datasets, ... The top-1 classification accuracy is evaluated on the CIFAR100 (Krizhevsky et al., 2009), Image Net1k (Russakovsky et al., 2015), and Image Net-R (Hendrycks et al., 2021) datasets. |
| Dataset Splits | Yes | Retrieval performance is evaluated on the test splits of the Flickr30k (Plummer et al., 2015) and MSCOCO (Lin et al., 2014) datasets, in terms of the average Recall@1 score of image-to-text and text-to-image retrievals. The top-1 classification accuracy is evaluated on the CIFAR100 (Krizhevsky et al., 2009), Image Net1k (Russakovsky et al., 2015), and Image Net-R (Hendrycks et al., 2021) datasets. |
| Hardware Specification | Yes | All experiments utilize distributed data-parallel (DDP) training on two NVIDIA A100 GPUs with 40GB memory and the total batch size B in each iteration is 512. |
| Software Dependencies | No | The paper mentions "Adam W (Loshchilov and Hutter, 2017)" as the optimizer and "ResNet-50 as the vision encoder and Distil Bert as the text encoder". It also refers to adapting implementations from the "Open CLIP repository". However, it does not provide specific version numbers for any software libraries, programming languages, or environments. |
| Experiment Setup | Yes | All experiments utilize distributed data-parallel (DDP) training on two NVIDIA A100 GPUs with 40GB memory and the total batch size B in each iteration is 512. Besides, we use Res Net-50 as the vision encoder and Distil Bert as the text encoder. ... We run each algorithm 3 times with different random seeds and each run contains 30 epochs. Hyperparameters of all algorithms are tuned based on the validation performance. The optimizer for the model parameter w is Adam W (Loshchilov and Hutter, 2017) with a weight decay of 0.02 and a cosine learning rate schedule (Loshchilov and Hutter, 2016). For all algorithms, we choose a fixed temperature parameter τ tuned within {0.01,0.03,0.05,0.07}. For Sog CLR and NUCLR, we set γ = 0.8 as in the Sog CLR paper (Yuan et al., 2022). For our NUCLR, we select ζ0 = 0.05 on the CC3M dataset and ζ0 = 0 on the CC12M dataset. Besides, we freeze ζ in the first 5 epochs and update ζ by the SGDm optimizer with a cosine learning rate schedule. |