reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bridging the Gap for Test-Time Multimodal Sentiment Analysis

Authors: Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, Yangyang Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that CASP brings significant and consistent improvements to the performance of the model across various distribution shift settings and with different backbones, demonstrating its effectiveness and versatility. We conduct extensive experiments on three multimodal sentiment analysis datasets: CMU-MOSI (Zadeh et al. 2016), CMU-MOSEI (Bagher Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020). We present our quantitative results across five different distribution shift settings with two different backbones in Table 1.
Researcher Affiliation	Academia	Zirun Guo, Tao Jin , Wenlong Xu, Wang Lin, Yangyang Wu Zhejiang University EMAIL
Pseudocode	No	The paper describes the methodology in prose and mathematical equations in Section 3, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/zrguo/CASP
Open Datasets	Yes	We conduct extensive experiments on three multimodal sentiment analysis datasets: CMU-MOSI (Zadeh et al. 2016), CMU-MOSEI (Bagher Zadeh et al. 2018) and CH-SIMS (Yu et al. 2020).
Dataset Splits	No	The paper describes using source and target domains for adaptation (e.g., CMU-MOSEI -> CH-SIMS), but does not provide specific training, validation, or test splits (e.g., percentages or counts) for the source domain data during pre-training, nor how the target domain data is further split for evaluation within the TTA context.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies	No	For text modality, we use pre-trained BERT (Devlin et al. 2019) to obtain word embeddings. We use BERT-base for CMU-MOSI and CMU-MOSEI and Chinese BERT-base for CH-SIMS. For audio modality, we use Lib ROSA (Mc Fee et al. 2015) to extract features. For video modality, we extract face features using Open Face 2.0 (Baltrusaitis et al. 2018) toolkit. The paper mentions software tools like BERT, Lib ROSA, and Open Face 2.0, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For source domain pre-training and contrastive adaptation, we use the Adam W optimizer with a learning rate of 1e 3. We adapt the model for 15 epochs and the interval hyperparameter M is set to 3. For stable pseudo-label generation, we set the threshold hyperparameter λ as 95. For self-training using stable pseudo labels, we use the Adam W optimizer with a learning rate of 5e 4 and train the model for 5 epochs. The batch size of all the experiments is 24. Besides, we use gradient clipping and set the threshold as 0.8. We also use a step scheduler with a step size of 10 and decay rate γ = 0.1.