Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models
Authors: Yiyang Fang, Jian Liang, Wenke Huang, He Li, Kehua Su, Mang Ye
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that SEPM significantly improves MLLM performance on emotion-related tasks, providing a resource-efficient and scalable solution for emotion recognition. |
| Researcher Affiliation | Academia | 1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China 2Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan, China. Correspondence to: Kehua Su <EMAIL>, Mang Ye <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 SEPM Input: Multimodal Large Language Models M, Coarse-Grained Query Qc, Sample D. Output: Specific emotion category E. |
| Open Source Code | Yes | Our code is available in https://github.com/fuyyyyy/SEPM. |
| Open Datasets | Yes | We evaluate our framework on four emotion datasets, which are annotated across different scenarios and numbers of categories: Emotion6 (Peng et al., 2015), Emo Set (Yang et al., 2023), Web Emo (Panda et al., 2018), and Abstract (Machajdik & Hanbury, 2010). |
| Dataset Splits | No | The paper lists several datasets used for evaluation (Emotion6, Emo Set, Web Emo, Abstract) but does not specify the train/validation/test splits used for their experiments. It mentions 'Zero-shot inference' but no explicit data partitioning details for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA 4090 GPUs, each with 24GB of memory. |
| Software Dependencies | No | The paper mentions using LLaVA (Liu et al., 2023) and VILA (Lin et al., 2024) as foundation models, but it does not specify version numbers for these or any other software libraries or programming languages used for implementation. |
| Experiment Setup | Yes | The confidence threshold α and drop rate β are set to 0.1 and 0.2 by default, respectively. |