reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Can One Modality Model Synergize Training of Other Modality Models?

Authors: Jae-Jun Lee, Sung Whan Yoon

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As proofs of concept, we broadly confirm the considerable gains from the synergy across visual, language, and audio models. ... In this section, we provide an overview of the experimental results, along with detailed descriptions of the datasets, models and additional experimental settings. ... Our main results are two parts: Table 1 with Image Net-1K and Table 2 with multimodal datasets, i.e., IEMOCAP and AVMNIST.
Researcher Affiliation	Academia	Jae-Jun Lee1, Sung Whan Yoon1,2 1Graduate School of Artificial Intelligence and 2Department of Electrical Engineering Ulsan National Institute of Science and Technology (UNIST) EMAIL
Pseudocode	Yes	Algorithm 1 Traininig Procedures for [Mj Mi]
Open Source Code	Yes	The code is available at https://github.com/johnjaejunlee95/synergistic-multimodal.
Open Datasets	Yes	Datasets For the main experiments, we test on the Image Net-1K dataset (Krizhevsky et al., 2012) for visual tasks as the case of [L V]. For further experiments in the multimodal setting, we employ the IEMOCAP (Busso et al., 2008) and AVMNIST (Liang et al., 2021; Li et al., 2023) datasets.
Dataset Splits	Yes	Datasets For the main experiments, we test on the Image Net-1K dataset (Krizhevsky et al., 2012) for visual tasks as the case of [L V]. ... Image Net-1K (Krizhevsky et al., 2012) is the image datasets that contains 1000 classes with 1,281,167 training images and 50,000 validation images.
Hardware Specification	Yes	Furthermore, we utilized Automatic Mixed Precision (Micikevicius et al., 2018) in conjunction with 4 A6000 GPUs.
Software Dependencies	No	The paper mentions software like PyTorch, AdamW optimizer, and Adam optimizer but provides citations to the papers introducing them, not specific version numbers of the software used in the experiments.
Experiment Setup	Yes	In the [L V] case for the Image Net-1K classification task, we adhered to the hyperparameter settings established by Aug Reg-Vi T (Steiner et al., 2022) for all training models, specifically Res Net-50, Vi T-B/32, and Vi T-B/16. For the baseline model, we trained for 300 epochs with a batch size of 1024, utilizing a learning rate of 1 10 3 and a weight decay of 5 10 2. We employed the Adam W optimizer (Loshchilov & Hutter, 2019) with cosine learning rate scheduling (Loshchilov & Hutter, 2017) and implemented a linear warmup for 20 epochs.