Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Decoupling and Reconstructing: A Multimodal Sentiment Analysis Framework Towards Robustness

Authors: Mingzheng Yang, Kai Zhang, Yuyang Ye, Yanghai Zhang, Runlong Yu, Min Hou

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two benchmark datasets demonstrate that DAR significantly outperforms existing methods in both modality reconstruction and sentiment analysis tasks, particularly in scenarios with missing or unaligned modalities. Our results show improvements of 2.21% in bi-classification accuracy and 3.9% in regression error compared to state-of-the-art baselines on the MOSEI dataset.
Researcher Affiliation Academia Mingzheng Yang1, Kai Zhang1 , Yuyang Ye2, Yanghai Zhang1, Runlong Yu3, Min Hou4 1State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2Rutgers University 3University of Pittsburgh 4Hefei University of Technology EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using prose and mathematical equations (e.g., equations 1-24) and block diagrams (Figure 2, Figure 3), but does not contain a dedicated pseudocode or algorithm block.
Open Source Code No The paper mentions that baseline models (MISA, Self-MM, MMIM, CENET, TETFN, ALMT, LNLN) were reproduced or used from open source code, but there is no explicit statement or link indicating that the authors' own proposed model (DAR) has its code publicly released.
Open Datasets Yes In this section, we provide a comprehensive and fair comparison between the proposed DAR and previous representative MSA methods on MOSI ([Zadeh et al., 2016]) and MOSEI ([Bagher Zadeh et al., 2018]) datasets.
Dataset Splits Yes MOSI The dataset includes 2,199 multimodal samples, integrating visual, audio, and language modalities. It is divided into a training set of 1,284 samples, a validation set of 229 samples, and a test set of 686 samples. MOSEI The dataset consists of 22,856 video clips sourced from You Tube. The sample is divided into 16,326 clips for training, 1,871 for validation, and 4,659 for testing.
Hardware Specification No The paper describes the software tools used (BERT, Librosa, OpenFace) and the datasets, but does not specify any hardware details such as GPU/CPU models or other computing infrastructure used for experiments.
Software Dependencies No Each modality is processed using widely-used tools: language data is encoded using BERT([Devlin, 2018]), audio features are extracted through Librosa ([Mc Fee et al., 2015]), and visual features are obtained using Open Face ([Baltrusaitis et al., 2018]). (No specific version numbers for these tools are provided.)
Experiment Setup Yes In training process, for hyperparameters, we choose that λ = 0.7, α = 0.1, β = 0.1. On the mosi dataset, we choose the missing rate k = 0.3, and on the mosei dataset, we choose k = 0.4. Compared with the baseline LNLN([Zhang et al., 2024a]) which uses the best model under different metrics for testing, we use the same model with the smallest overall loss as the optimal model for testing, and at the same time, in order to ensure the stability of the results, we randomly test three times and take the average value as the final result following the baseline settings.