Enhancing Relation Extraction via Supervised Rationale Verification and Feedback
Authors: Yongqi Li, Xin Miao, Shen Zhou, Mayi Xu, Yuyang Ren, Tieyun Qian
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments prove that our proposed framework significantly outperforms existing methods. Extensive experiments demonstrate the superiority of our framework over existing methods. Table 1 reports the experimental results with various initial demonstration selection strategies on Llama-2-7b-chat on the Sem Eval, TACRED, and Re-TACRED datasets. |
| Researcher Affiliation | Collaboration | Yongqi Li1, Xin Miao1, Shen Zhou1, Mayi Xu1, Yuyang Ren1,3, Tieyun Qian1,2* 1School of Computer Science, Wuhan University, China 2Intellectual Computing Laboratory for Cultural Heritage, Wuhan University, China 3Research Institute of Nuclear Power Operation, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the proposed method in narrative text and uses figures to illustrate the framework and causal models, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/NLPGM/SRVF |
| Open Datasets | Yes | We adopt three commonly used datasets for RE, including Sem Eval (Hendrickx et al. 2010), TACRED (Zhang et al. 2017), and Re-TACRED (Stoica, Platanios, and P oczos 2021). Also, Doc RED (Yao et al. 2019) and Re-Doc RED (Tan et al. 2022). |
| Dataset Splits | Yes | Hence we adopt the k-shot ( k {5, 10, 20, 50}) settings to validate the effectiveness of the proposed method. |
| Hardware Specification | No | The paper evaluates its method using various LLMs (Llama-2-7b-chat, Llama-2-70b-chat, Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B-Instruct, GPT-3.5-turbo), but does not specify the underlying hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | For Self-Consistency, GRACE, and ours, the number of iterations or candidate responses is set to 5 for fairness. For Self-Refine, the iteration number is set to 1 since we find that more iteration rounds result in performance degradation. Here we adopt the dot product as the similarity function sim() and add a temperature hyper-parameter τ to focus more on difficult pairs (Chen et al. 2020). |