Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models

Authors: Yiyang Fang, Jian Liang, Wenke Huang, He Li, Kehua Su, Mang Ye

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that SEPM significantly improves MLLM performance on emotion-related tasks, providing a resource-efficient and scalable solution for emotion recognition.
Researcher Affiliation Academia 1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China 2Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan, China. Correspondence to: Kehua Su <EMAIL>, Mang Ye <EMAIL>.
Pseudocode Yes Algorithm 1 SEPM Input: Multimodal Large Language Models M, Coarse-Grained Query Qc, Sample D. Output: Specific emotion category E.
Open Source Code Yes Our code is available in https://github.com/fuyyyyy/SEPM.
Open Datasets Yes We evaluate our framework on four emotion datasets, which are annotated across different scenarios and numbers of categories: Emotion6 (Peng et al., 2015), Emo Set (Yang et al., 2023), Web Emo (Panda et al., 2018), and Abstract (Machajdik & Hanbury, 2010).
Dataset Splits No The paper lists several datasets used for evaluation (Emotion6, Emo Set, Web Emo, Abstract) but does not specify the train/validation/test splits used for their experiments. It mentions 'Zero-shot inference' but no explicit data partitioning details for reproducibility.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA 4090 GPUs, each with 24GB of memory.
Software Dependencies No The paper mentions using LLaVA (Liu et al., 2023) and VILA (Lin et al., 2024) as foundation models, but it does not specify version numbers for these or any other software libraries or programming languages used for implementation.
Experiment Setup Yes The confidence threshold α and drop rate β are set to 0.1 and 0.2 by default, respectively.