reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition

Authors: Zheng Lian, Haiyang Sun, Licai Sun, Haoyu Chen, Lan Chen, Hao Gu, Zhuofan Wen, Shun Chen, Zhang Siyuan, Hailiang Yao, Bin Liu, Rui Liu, Shan Liang, Ya Li, Jiangyan Yi, Jianhua Tao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Benchmark. We build zero-shot benchmarks for OV-MER through extensive experiments and detailed analysis. This task can serve as an important evaluation benchmark for multimodal LLMs (MLLMs), challenging their ability to integrate multimodal clues and capture subtle temporal variations in emotional expression. Experiments. Our intensive experimental results not only demonstrate the strength of our methods but also prove that OV-MER can effectively enhance the presentation ability of emotions and user experience.
Researcher Affiliation	Academia	1Institute of Automation, Chinese Academy of Sciences 2Shanghai Jiao Tong University 3CMVS, University of Oulu 4Inner Mongolia University 5Xi an Jiaotong-Liverpool University 6Beijing University of Posts and Telecommunicationsy 7Department of Automation, Tsinghua University 8Beijing National Research Center for Information Science and Technology, Tsinghua University.
Pseudocode	No	The paper describes a model architecture with mathematical equations in Appendix U but does not present a structured pseudocode or algorithm block. For example: "hm i = Re LU f m i W h m + bh m , m {a, l, v}, (8) hi = Concat ha i , hl i, hv i , (9) αi = softmax h T i Wα + bα , (10) zi = hiαi. (11)" This is a mathematical description, not pseudocode.
Open Source Code	Yes	Code and dataset are available at: https://github.com/zero Qiaoba/Affect GPT.
Open Datasets	Yes	Code and dataset are available at: https://github.com/zero Qiaoba/Affect GPT. Ultimately, we create a dataset, OV-MERD, which offers a richer set of emotions compared to existing datasets (see Table 1). This dataset is an extension of MER2023 (Lian et al., 2023).
Dataset Splits	No	The paper states that OV-MERD is an extension of MER2023 from which a portion of samples were randomly selected for further annotation. However, it does not specify the train/validation/test splits for the OV-MERD dataset used in their experiments. For example: "We randomly selected a subset of MER2023 for further annotation to construct our OV-MERD dataset."
Hardware Specification	Yes	All models are implemented in Py Torch, and all inference processes are executed on a 32GB NVIDIA Tesla V100 GPU.
Software Dependencies	No	The paper mentions "All models are implemented in Py Torch" but does not specify a version number for Py Torch or any other key software dependencies with version numbers.
Experiment Setup	No	The paper discusses various baselines and evaluation metrics, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) for training any of the models used in the experiments. It focuses on zero-shot evaluation and general model performance rather than detailed training configurations.