reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Equilibrium: An Instantaneous Probe-and-Rebalance Multimodal Learning Approach

Authors: Yang Yang, Xixian Wu, Qing-Yuan Jiang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments reveal that our proposed IPRM outperforms all baselines, achieving state-of-the-art (SOTA) performance on numerous widely used datasets. The code is available at https://github.com/njustkmg/IJCAI25-IPRM.
Researcher Affiliation	Academia	Nanjing University of Science and Technology EMAIL
Pseudocode	Yes	Algorithm 1: The IPRM learning algorithm.
Open Source Code	Yes	The code is available at https://github.com/njustkmg/IJCAI25-IPRM.
Open Datasets	Yes	We utilize five datasets for experiments, i.e., CREMA-D [Cao et al., 2014], KSounds [Arandjelovic and Zisserman, 2017], NVGesture [Molchanov et al., 2016], IEMOCAP [Busso et al., 2008], and Sarcasm [Cai et al., 2019] datasets.
Dataset Splits	Yes	The CREMA-D dataset contains 7,442 clips from 91 actors. And it is divided into a training set with 6,698 samples and a testing set with 744 samples. The KSounds dataset is divided into a training set with 15K samples, a validation set with 1.9K samples, and a testing set with 1.9K samples. For NVGesture dataset, it is split as 1,050 data points for training and 482 for testing. And IEMOCAP dataset is split as a training set with 3,318 samples and a testing set with 1,107 samples. Sarcasm dataset consists of 24,635 and is split as a training set with 19,816 samples, a testing set with 2,409, and a validation set with 2,410 samples.
Hardware Specification	Yes	All experiments are conducted on Ge Force RTX 4090 NVIDIA card.
Software Dependencies	No	The paper mentions models like ResNet18, I3D, M3AE, CAVMAE, BERT, and CLIP but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, Python) used for implementation.
Experiment Setup	Yes	The optimization algorithm for the audio-video and trimodal datasets is stochastic gradient descent (SGD), while Adam is employed for the image-text dataset. The learning rate is set to 10 2 for the audio-video datasets and NVGesture, 10 3 for IEMOCAP, and 10 4 for Sarcasm, respectively. It is then reduced by a factor of 10 when the loss saturates. The batch size is set to be 64 for CREMA-D, KSounds and Sarcasm, while is respectively set to be 2 and 16 for NVGesture and IEMOCAP due to out-of-memory issue. Furthermore, the hyper-parameter α is set to 0.8 for audio-video datasets and 0.7 for trimodal and image-text datasets, based on cross-validation strategy.