Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Decoupling and Reconstructing: A Multimodal Sentiment Analysis Framework Towards Robustness
Authors: Mingzheng Yang, Kai Zhang, Yuyang Ye, Yanghai Zhang, Runlong Yu, Min Hou
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two benchmark datasets demonstrate that DAR significantly outperforms existing methods in both modality reconstruction and sentiment analysis tasks, particularly in scenarios with missing or unaligned modalities. Our results show improvements of 2.21% in bi-classification accuracy and 3.9% in regression error compared to state-of-the-art baselines on the MOSEI dataset. |
| Researcher Affiliation | Academia | Mingzheng Yang1, Kai Zhang1 , Yuyang Ye2, Yanghai Zhang1, Runlong Yu3, Min Hou4 1State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2Rutgers University 3University of Pittsburgh 4Hefei University of Technology EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and mathematical equations (e.g., equations 1-24) and block diagrams (Figure 2, Figure 3), but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions that baseline models (MISA, Self-MM, MMIM, CENET, TETFN, ALMT, LNLN) were reproduced or used from open source code, but there is no explicit statement or link indicating that the authors' own proposed model (DAR) has its code publicly released. |
| Open Datasets | Yes | In this section, we provide a comprehensive and fair comparison between the proposed DAR and previous representative MSA methods on MOSI ([Zadeh et al., 2016]) and MOSEI ([Bagher Zadeh et al., 2018]) datasets. |
| Dataset Splits | Yes | MOSI The dataset includes 2,199 multimodal samples, integrating visual, audio, and language modalities. It is divided into a training set of 1,284 samples, a validation set of 229 samples, and a test set of 686 samples. MOSEI The dataset consists of 22,856 video clips sourced from You Tube. The sample is divided into 16,326 clips for training, 1,871 for validation, and 4,659 for testing. |
| Hardware Specification | No | The paper describes the software tools used (BERT, Librosa, OpenFace) and the datasets, but does not specify any hardware details such as GPU/CPU models or other computing infrastructure used for experiments. |
| Software Dependencies | No | Each modality is processed using widely-used tools: language data is encoded using BERT([Devlin, 2018]), audio features are extracted through Librosa ([Mc Fee et al., 2015]), and visual features are obtained using Open Face ([Baltrusaitis et al., 2018]). (No specific version numbers for these tools are provided.) |
| Experiment Setup | Yes | In training process, for hyperparameters, we choose that λ = 0.7, α = 0.1, β = 0.1. On the mosi dataset, we choose the missing rate k = 0.3, and on the mosei dataset, we choose k = 0.4. Compared with the baseline LNLN([Zhang et al., 2024a]) which uses the best model under different metrics for testing, we use the same model with the smallest overall loss as the optimal model for testing, and at the same time, in order to ensure the stability of the results, we randomly test three times and take the average value as the final result following the baseline settings. |