On the Comparison between Multi-modal and Single-modal Contrastive Learning
Authors: Wei Huang, Andi Han, Yongqiang Chen, Yuan Cao, Zhiqiang Xu, Taiji Suzuki
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiments on both synthetic and real-world datasets further consolidate our theoretical findings. |
| Researcher Affiliation | Academia | Wei Huang RIKEN AIP EMAIL Andi Han RIKEN AIP EMAIL Yongqiang Chen The Chinese University of Hong Kong EMAIL Yuan Cao The University of Hong Kong EMAIL Zhiqiang Xu MBZUAI EMAIL Taiji Suzuki University of Tokyo & RIKEN AIP EMAIL |
| Pseudocode | No | The paper includes mathematical equations and derivations (e.g., in Sections 3.1, 3.2, and 5), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We have also uploaded the code in the supplementary material. |
| Open Datasets | Yes | Synthetic experiments We conduct synthetic experiments to verify the theoretical results obtained in the previous sections. We generate samples following the theoretical setups, where we set the data dimension d = 2000, number of training samples n = 100, number of test samples ntest = 200, and the hidden size of all encoders as m = 50. [...] Real-world experiments We now extend the comparison of single-modal and multi-modal learning to realistic image data, Colored MNIST [3, 54], which is a typical benchmark studying the generalization capability under distribution shifts. |
| Dataset Splits | No | The paper specifies training and test sets but does not explicitly mention or detail a separate validation split for either the synthetic or real-world experiments. For example, it states, 'number of training samples n = 100, number of test samples ntest = 200' and describes the setup for 'training set' and 'test set' for Colored MNIST, but no 'validation set'. |
| Hardware Specification | Yes | We run all the experiments on Linux servers with NVIDIA V100 graphics cards and CUDA 11.2, completing them within one hour. |
| Software Dependencies | No | The paper states 'We implement our methods using Py Torch.' and 'CUDA 11.2'. While CUDA has a version, PyTorch does not, which is a key software dependency. Therefore, complete version information is not provided. |
| Experiment Setup | Yes | We adopt gradient descent with a learning rate of 0.01 as the optimizer to train the model by 200 epochs. In the single-modal setting, the µ is set to be [5, 0, ..., 0]T and the ξ N(0, I) for the in-distribution data, and the augmentation vector ϵ N(0, 0.01 I). For the multi-modal setting, µ = [0, 15, 0, ..., 0]T . In addition, for the OOD test data xtest = [ν , ζ ] Dtest, we set ν = [2, 0, ..., 0] and ζ N(0, I). [...] For the training set, 10% of labels will be clipped to a random class. For images with class 0 (or 1 ), they will be colored as red (or green) with a probability of 77.5%, and as another random color with a probability of 22.5%. |