Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning
Authors: Yan-Kai Liu, Jinyu Cai, Bao-Liang Lu, Wei-Long Zheng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five public multimodal emotion datasets demonstrate that our model achieves the state-of-the-art performance in the cross-modal tasks and maintains multimodal performance using only a single modality. |
| Researcher Affiliation | Academia | Yan-Kai Liu, Jinyu Cai, Bao-Liang Lu, Wei-Long Zheng* Shanghai Jiao Tong University EMAIL |
| Pseudocode | Yes | Algorithm 1: Pre-training Phase of M2S Model Input: Unlabeled and paired data XA, XB from modalities A and B Hyper Parameter: Learning rate and weight decay Output: Pre-trained encoders 1: Input XA, XB into their corresponding encoders: z A = ERA(XA), z A = EIA(XA); z B = ERB(XB), z B = EIB(XB). 2: Compute LA CLUB(z A, z A) and LB CLUB(z B, z B). 3: Compute LA Recon and LB Recon. 4: Input z A and z B into M2M CPC module. 5: Compute LA2B CP C, LB2A CP C, LA2A CP C, and LB2B CP C. 6: Flatten z A and z B, and project them into a new space. 7: Compute LContra. 8: Optimize the final loss: L = α LCLUB + β LRecon + γ LContra + λ LCP C. 9: return Encoders ERA and ERB. |
| Open Source Code | Yes | Code https://github.com/Arcee-LYK/Multi-to-Single. |
| Open Datasets | Yes | In the experiment, we use five public multimodal emotion datasets: SEED (Duan, Zhu, and Lu 2013; Zheng and Lu 2015), SEED-IV (Zheng et al. 2018), SEED-V (Liu et al. 2021), DEAP (Koelstra et al. 2011), and DREAMER (Katsigiannis and Ramzan 2017). |
| Dataset Splits | Yes | For the division of the training and testing sets, due to the fixed emotional labels corresponding to each video clip in the SEED series datasets, we divide the SEED, SEED-IV, and SEED-V datasets in ratios of 9:6, 16:8, and 10:5, respectively. The labels in the DEAP and DREAMER datasets are the scores of subjects on certain evaluation metrics, including valence, arousal, and dominance. This label leads to an uneven distribution of data, so we conduct four-fold and three-fold cross-validation on the DEAP and DREAMER, respectively. Each fold s training and testing ratios are 3:1 and 2:1, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9, CUDA 11.1) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions a "pre-training phase" and "fine-tuning stage" but does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations within the main text. |