Bridging the Gap for Test-Time Multimodal Sentiment Analysis

Authors: Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, Yangyang Wu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that CASP brings significant and consistent improvements to the performance of the model across various distribution shift settings and with different backbones, demonstrating its effectiveness and versatility. We conduct extensive experiments on three multimodal sentiment analysis datasets: CMU-MOSI (Zadeh et al. 2016), CMU-MOSEI (Bagher Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020). We present our quantitative results across five different distribution shift settings with two different backbones in Table 1.
Researcher Affiliation Academia Zirun Guo*, Tao Jin , Wenlong Xu*, Wang Lin, Yangyang Wu Zhejiang University EMAIL
Pseudocode No The paper describes the methodology in prose and mathematical equations in Section 3, but does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/zrguo/CASP
Open Datasets Yes We conduct extensive experiments on three multimodal sentiment analysis datasets: CMU-MOSI (Zadeh et al. 2016), CMU-MOSEI (Bagher Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020).
Dataset Splits No The paper describes using source and target domains for adaptation (e.g., CMU-MOSEI -> CH-SIMS), but does not provide specific training, validation, or test splits (e.g., percentages or counts) for the source domain data during pre-training, nor how the target domain data is further split for evaluation within the TTA context.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No For text modality, we use pre-trained BERT (Devlin et al. 2019) to obtain word embeddings. We use BERT-base for CMU-MOSI and CMU-MOSEI and Chinese BERT-base for CH-SIMS. For audio modality, we use Lib ROSA (Mc Fee et al. 2015) to extract features. For video modality, we extract face features using Open Face 2.0 (Baltrusaitis et al. 2018) toolkit. The paper mentions software tools like BERT, Lib ROSA, and Open Face 2.0, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For source domain pre-training and contrastive adaptation, we use the Adam W optimizer with a learning rate of 1e 3. We adapt the model for 15 epochs and the interval hyperparameter M is set to 3. For stable pseudo-label generation, we set the threshold hyperparameter λ as 95. For self-training using stable pseudo labels, we use the Adam W optimizer with a learning rate of 5e 4 and train the model for 5 epochs. The batch size of all the experiments is 24. Besides, we use gradient clipping and set the threshold as 0.8. We also use a step scheduler with a step size of 10 and decay rate γ = 0.1.