Bridging the Gap for Test-Time Multimodal Sentiment Analysis
Authors: Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, Yangyang Wu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that CASP brings significant and consistent improvements to the performance of the model across various distribution shift settings and with different backbones, demonstrating its effectiveness and versatility. We conduct extensive experiments on three multimodal sentiment analysis datasets: CMU-MOSI (Zadeh et al. 2016), CMU-MOSEI (Bagher Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020). We present our quantitative results across five different distribution shift settings with two different backbones in Table 1. |
| Researcher Affiliation | Academia | Zirun Guo*, Tao Jin , Wenlong Xu*, Wang Lin, Yangyang Wu Zhejiang University EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations in Section 3, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/zrguo/CASP |
| Open Datasets | Yes | We conduct extensive experiments on three multimodal sentiment analysis datasets: CMU-MOSI (Zadeh et al. 2016), CMU-MOSEI (Bagher Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020). |
| Dataset Splits | No | The paper describes using source and target domains for adaptation (e.g., CMU-MOSEI -> CH-SIMS), but does not provide specific training, validation, or test splits (e.g., percentages or counts) for the source domain data during pre-training, nor how the target domain data is further split for evaluation within the TTA context. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | For text modality, we use pre-trained BERT (Devlin et al. 2019) to obtain word embeddings. We use BERT-base for CMU-MOSI and CMU-MOSEI and Chinese BERT-base for CH-SIMS. For audio modality, we use Lib ROSA (Mc Fee et al. 2015) to extract features. For video modality, we extract face features using Open Face 2.0 (Baltrusaitis et al. 2018) toolkit. The paper mentions software tools like BERT, Lib ROSA, and Open Face 2.0, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For source domain pre-training and contrastive adaptation, we use the Adam W optimizer with a learning rate of 1e 3. We adapt the model for 15 epochs and the interval hyperparameter M is set to 3. For stable pseudo-label generation, we set the threshold hyperparameter λ as 95. For self-training using stable pseudo labels, we use the Adam W optimizer with a learning rate of 5e 4 and train the model for 5 epochs. The batch size of all the experiments is 24. Besides, we use gradient clipping and set the threshold as 0.8. We also use a step scheduler with a step size of 10 and decay rate γ = 0.1. |