SetKE: Knowledge Editing for Knowledge Elements Overlap

Authors: Yifan Wei, Xiaoyan Yu, Ran Song, Hao Peng, Angsheng Li

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Set KE outperforms existing methods in KEO scenarios on mainstream LLMs. Additionally, we introduce EDITSET, a dataset containing KEO triplets, providing a comprehensive benchmark.
Researcher Affiliation Academia 1State Key Laboratory of CCSE, School of Computer Science and Engineering, Beihang University 2School of Computer Science and Technology, Beijing Institute of Technology 3Kunming University of Science and Technology EMAIL, EMAIL, song EMAIL
Pseudocode Yes The Hungarian algorithm guarantees finding the optimal matching in O(N 3) time complexity, as shown in Appendix Algorithm 1.
Open Source Code No The paper does not contain an explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We propose a novel formulation of Knowledge Set Editing (KSE) and construct a new dataset, EDITSET, to facilitate in-depth exploration of Knowledge Element Overlap (KEO)... Building on this observation, we collect KEO instances from Wikidata to construct a new dataset, EDITSET, enabling a more comprehensive exploration of KEO in KE. The dataset comprises over 700 relation types, with our study focusing on the 31 most common ones, consistent with prior research [Levy et al., 2017; Elazar et al., 2021; Meng et al., 2022a; Zhong et al., 2023; Wei et al., 2024; Yin et al., 2024; Ma et al., 2024].
Dataset Splits Yes The counterfactual prompt is employed to assess Efficacy, the paraphrase prompt for Generalization, and the neighborhood prompt for Locality. ... The EDITSET dataset consists of three types of prompt, Counter.P., Para.P., and Neigh.P. denote Counterfactual Prompt, Paraphrase Prompt, and Neighborhood Prompt, respectively.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies No The paper mentions using Large Language Models like GPT2-Large, GPT2-XL, and GPT-J, but does not provide specific software dependencies (e.g., library names with version numbers like PyTorch, TensorFlow, or CUDA versions) used for implementation.
Experiment Setup Yes Evaluation Metrics The evaluation metrics for the new formulation of KSE remain consistent with previous works (where the object is singular) [Meng et al., 2022a; Meng et al., 2022b]... Language Models We employ two widely adopted autoregressive language models, namely GPT2-Large (760M), GPT2-XL (1.5B) and GPT-J (6B) [Radford et al., 2019], as the base language models to perform editing and assess the effectiveness of the KE approaches. Baselines We select the following approaches: FT-W is a basic fine-tuning method. KN [Dai et al., 2022]... MEND [Mitchell et al., 2021]... ROME [Meng et al., 2022a]... MEMIT [Meng et al., 2022b]... PMET [Li et al., 2024]...