Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering

Authors: Yifan Lu, Yigeng Zhou, Jing Li, Yequan Wang, Xuebo Liu, Daojing He, Fangming Liu, Min Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmarks show that KEDKG surpasses previous state-of-the-art models, delivering more accurate and reliable answers in environments with dynamic information. We conduct extensive experiments across various LLMs and datasets to validate the effectiveness and usability of KEDKG. Our empirical results and analysis demonstrate that KEDKG significantly outperforms the advanced existing baselines, achieving superior performance.
Researcher Affiliation Academia 1Harbin Institute of Technology, Shenzhen, China 2Beijing Academy of Artificial Intelligence, Beijing, China 3Pengcheng Laboratory, Shenzhen, China
Pseudocode No The paper describes the methodology using text and a system diagram (Figure 2), but does not contain a dedicated pseudocode or algorithm block.
Open Source Code No The paper does not contain an unambiguous statement or a direct link indicating the release of source code for the methodology described.
Open Datasets Yes We evaluate KEDKG using the MQu AKE dataset. MQu AKE is a knowledge editing benchmark for multi-hop QA, comprising MQu AKE-CF based on counterfactual editing and MQu AKE-T based on temporal knowledge updates. We use MQu AKE-CF as the training set, which contains 9,218 data points, and MQu AKE-CF-3k as the test set, which includes 3,000 data points.
Dataset Splits Yes We use MQu AKE-CF as the training set, which contains 9,218 data points, and MQu AKE-CF-3k as the test set, which includes 3,000 data points.
Hardware Specification Yes All our experiments are carried out on a NVIDIA 8 A800-SXM4-80G machine.
Software Dependencies No We train an entity detector and a relation detector based on the Distil BERT (Sanh 2019) model and fine-tune the Llama 2-7B model for the question decomposition task. In addition, we use REBEL (Cabot and Navigli 2021) as our relation extraction model and spacy entity linker as entity linking model. While specific models are named, explicit version numbers for Distil BERT, REBEL, and spacy entity linker are not provided.
Experiment Setup Yes If the highest probability p exceeds a threshold α, which is set to 0.5 in our experiments, we can retrieve the corresponding fact triple (s, r, o ) and use o as the retrieval answer. We train an entity detector and a relation detector based on the Distil BERT (Sanh 2019) model and fine-tune the Llama 2-7B model for the question decomposition task.