MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models

Authors: Xilin He, Haijian Liang, Boyi Peng, Weicheng Xie, Muhammad Haris Khan, Siyang Song, Zitong Yu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on CMU-MOSI, CMU-MOSEI and CH-SIMS demonstrate the superior performance of the proposed MSAmba over prior Transformer-based and CNN-based methods. Quantitative results demonstrate that MSAmba outperforms previous state-of-the-arts (SOTAs), which rely heavily on the abundant utilization of cross-attention transformers, in terms of performance, parameter size and inference speed. Further, serving as a pioneering research of adapting Mamba-based architecture in MSA, we observe that simply applying vanilla Mamba in a naive paradigm could also achieve comparative results against previous state-of-the-arts (SOTAs), demonstrating the great research potential of Mamba-based architectures in MSA. We evaluate MSAmba on three standard datasets for MSA, namely CMU-MOSI (Zadeh et al. 2016), CMUMOSEI (Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020).
Researcher Affiliation Academia 1Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University, 2Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, 3Guangdong Provincial Key Laboratory of Intelligent Information Processing, 4Mohamed bin Zayed University of Artificial Intelligence, 5University of Exeter, 6Great Bay University
Pseudocode No The paper includes architectural diagrams (Figure 1, 2, 3) illustrating the proposed model components and their interactions, but it does not contain any explicit pseudocode or algorithm blocks detailing the procedural steps in a structured, code-like format.
Open Source Code Yes Code is available at https://github.com/C0notSilly/MSAmba.
Open Datasets Yes We evaluate MSAmba on three standard datasets for MSA, namely CMU-MOSI (Zadeh et al. 2016), CMUMOSEI (Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020).
Dataset Splits No On evaluation metrics, following prior works (Yang et al. 2023; Yang, Dong, and Qiang 2024; Zhang et al. 2023), we adopt binary classification accuracy (Acc-2), F1, seven classification accuracy (Acc-7), mean absolute error (MAE) and the correlation of the model s prediction with human (Corr). Further, on CMU-MOSI and CMU-MOSEI, following the protocol of prior works (Yang et al. 2023; Yang, Dong, and Qiang 2024; Zhang et al. 2023), Acc-2 and F1 are calculated in two ways: negative/non-negative and negative/positive. The paper mentions following protocols of prior works for evaluation metrics and how some metrics are calculated, but it does not explicitly state the specific training/validation/test splits (e.g., percentages or sample counts) used for the datasets within the text.
Hardware Specification Yes All the experiments are conducted on a single NVIDIA A100-80GB GPU.
Software Dependencies No We use Adam W to optimize the model. The paper mentions using AdamW as an optimizer, and pre-trained models like BERT (Devlin et al. 2019), Librosa (Mc Fee et al. 2015), and Open Face (Baltruˇsaitis, Robinson, and Morency 2016) for feature extraction. However, it does not specify version numbers for these software components or any other key libraries used for implementation.
Experiment Setup Yes The hidden states dimension and the expansion coefficient of each Mamba block are set as 128 and 2, respectively. The numbers of ISM blocks and CHM blocks are set as 2 and 1 across various datasets, respectively. We use Adam W to optimize the model. We train the model for 200 epochs, with a batch size of 128. λ for balancing the training loss is set as 0.5 across all datasets.