Utterance-level Emotion Recognition in Conversation with Conversation-level Supervision
Authors: Ximing Li, Yuanchao Dai, Zhiyao Yang, Jinjin Chi, Wanfu Gao, Lin Yuanbo Wu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that the proposed DERC-PL can be on par with existing weakly-supervised learning baselines and supervised learning ERC methods. We conduct extensive experiments to validate the effectiveness of DERC-PL and provide empirical evidence that coarse-grained DERC can be a strong candidate for fine-grained ERC. In this section, we conduct experiments to evaluate DERCPL, and attempt to answer the following questions: Q1 : Can DERC-PL compete with the existing weaklysupervised learning methods in DERC settings? Q2 : Can DERC-PL compete with the existing supervised learning ERC methods? |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3 Swansea University, United Kingdom |
| Pseudocode | Yes | Algorithm 1: Computation of DERC-PL |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code, nor does it provide a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | We employ three benchmark ERC datasets: MELD (Poria et al. 2019), IEMOCAP (Busso et al. 2008), and Emory NLP (Zahiri and Choi 2018). Statistics of these datasets are listed in Table 3. |
| Dataset Splits | Yes | Statistics of these datasets are listed in Table 3. For each dataset, we generate its DERC version by directly adding the utterance-level emotions into the corresponding conversation-level emotion sets. Table 3: The statistics of benchmark datasets. Dataset Conversation Utterance Total Train Validation Test Total Train Validation Test IEMOCAP 151 120 31 7,433 5,810 1,623 Emory NLP 827 659 89 79 9,489 7,551 95 984 MELD 1,432 1,039 114 280 13,708 9,989 1,109 2,610 |
| Hardware Specification | Yes | Our experiments are conducted on Ubuntu 20.04 with a single RTX-4090 GPU with 24G memory. |
| Software Dependencies | No | The paper mentions "Ubuntu 20.04" and the "Adam W optimizer" but does not provide specific version numbers for other key software components like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation. |
| Experiment Setup | Yes | For BERT-based methods (i.e., BERTbase+MLP, RGAT (Ishiwatari et al. 2020)) / Ro BERTa-based methods (i.e., SACL (Hu et al. 2023), Dual GAT (Zhang, Chen, and Chen 2023)), we use the Adam W optimizer (Loshchilov and Hutter 2019), with learning rates of 2e 5 and 1e 4, respectively. The layer dropout rate, batch size, and the number of epochs T are configured to 0.1/0.2, 16/16, and 30/20, respectively. The hyperparameter α is adjusted to 0.8 for IEMOCAP (Busso et al. 2008), 0.3 for Emory NLP (Zahiri and Choi 2018), and 0.4 for MELD (Poria et al. 2019). |