Enhancing Relation Extraction via Supervised Rationale Verification and Feedback

Authors: Yongqi Li, Xin Miao, Shen Zhou, Mayi Xu, Yuyang Ren, Tieyun Qian

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments prove that our proposed framework significantly outperforms existing methods. Extensive experiments demonstrate the superiority of our framework over existing methods. Table 1 reports the experimental results with various initial demonstration selection strategies on Llama-2-7b-chat on the Sem Eval, TACRED, and Re-TACRED datasets.
Researcher Affiliation Collaboration Yongqi Li1, Xin Miao1, Shen Zhou1, Mayi Xu1, Yuyang Ren1,3, Tieyun Qian1,2* 1School of Computer Science, Wuhan University, China 2Intellectual Computing Laboratory for Cultural Heritage, Wuhan University, China 3Research Institute of Nuclear Power Operation, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the proposed method in narrative text and uses figures to illustrate the framework and causal models, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/NLPGM/SRVF
Open Datasets Yes We adopt three commonly used datasets for RE, including Sem Eval (Hendrickx et al. 2010), TACRED (Zhang et al. 2017), and Re-TACRED (Stoica, Platanios, and P oczos 2021). Also, Doc RED (Yao et al. 2019) and Re-Doc RED (Tan et al. 2022).
Dataset Splits Yes Hence we adopt the k-shot ( k {5, 10, 20, 50}) settings to validate the effectiveness of the proposed method.
Hardware Specification No The paper evaluates its method using various LLMs (Llama-2-7b-chat, Llama-2-70b-chat, Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B-Instruct, GPT-3.5-turbo), but does not specify the underlying hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper does not explicitly state specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes For Self-Consistency, GRACE, and ours, the number of iterations or candidate responses is set to 5 for fairness. For Self-Refine, the iteration number is set to 1 since we find that more iteration rounds result in performance degradation. Here we adopt the dot product as the similarity function sim() and add a temperature hyper-parameter τ to focus more on difficult pairs (Chen et al. 2020).