Structural Entropy Guided Probabilistic Coding
Authors: Xiang Huang, Hao Peng, Li Sun, Hui Lin, Chunyang Liu, Jiang Cao, Philip S. Yu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC compared to other state-of-the-art models in terms of effectiveness, generalization capability, and robustness to label noise. Extensive experiments on 12 datasets demonstrate that SEPC achieves SOTA performance in classification and regression tasks regarding effectiveness, generalization, and robustness. |
| Researcher Affiliation | Collaboration | Xiang Huang1, Hao Peng1,2*, Li Sun3, Hui Lin4, Chunyang Liu5, Jiang Cao6, Philip S. Yu7 1Beihang University 2Guangdong Laboratory of Artificial Intelligence and Digital Economy 3North China Electric Power University 4China Academic of Electronics and Information Technology 5Didi Chuxing 6Academy of Military Sciences 7University of Illinois Chicago |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/SELGroup/SEPC |
| Open Datasets | Yes | Following Hu et al. (2024), we evaluate SEPC on 10 classification task datasets and 2 regression task datasets. For classification tasks, 7 datasets about tweet semantic analysis are used: Emoji (Barbieri et al. 2018), Emotion (Mohammad et al. 2018), Hate (Basile et al. 2019), Irony (Van Hee, Lefever, and Hoste 2018), Offensive (Zampieri et al. 2019), Sentiment (Rosenthal, Farra, and Nakov 2017), and Stance (Mohammad et al. 2016). Additionally, we also experiment on three emotion-related datasets from different domains: ISEAR (Scherer and Wallbott 1994), MELD (Poria et al. 2019), and Go Emotions (Demszky et al. 2020). For regression tasks, we utilize STS-B (Cer et al. 2017) and Claire (Roth, Anthonio, and Sauer 2022) for evaluation. |
| Dataset Splits | No | The paper mentions using specific datasets and a 'test set' in its evaluation. It also describes varying the training data percentage (e.g., 'randomly select 90%, 70%, 50%, and 30% of the training data'). However, it does not provide explicit details about the initial train/validation/test splits (e.g., '80/10/10 split' or specific sample counts) for each of the listed datasets, nor does it explicitly reference standard splits for all of them within this text. |
| Hardware Specification | Yes | All experiments are conducted on two NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The training epoch number is 20, and the maximum patience for early stopping is 5 epochs. The learning rate is 5e-5 in all datasets. A linear learning rate warm-up is applied over the first 10% of the training data. The batch size is uniformly set to 128. The trade-off parameter ω and the weight parameter ϱ are searched from {1e 2, 1e 1, 1, 10}. |