Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing

Authors: Tianci Liu, Ruirui Li, Zihan Dong, Hui Liu, Xianfeng Tang, Qingyu Yin, Linjun Zhang, Haoyu Wang, Jing Gao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method.
Researcher Affiliation Collaboration 1Purdue University 2Amazon 3Rutgers University 4University at Albany.
Pseudocode Yes Algorithm 1 OVERTONE Training Paradigm
Open Source Code No The paper states that "All of our experiments are run on Easy Edit (Wang et al., 2024e)", which is a third-party framework used by the authors. It does not explicitly provide a link or statement about the release of the source code for their proposed method, OVERTONE.
Open Datasets Yes Following Wang et al. (2023b); Zhang et al. (2024c), we edit different kinds of knowledge: Wiki Datarecent, Wiki Datacounterfact (Cohen et al., 2024), Wiki Bio (Hartvigsen et al., 2024), and Zs RE (Yao et al., 2023). Besides the four popular benchmarks, we also explore more complex MQu AKE (Zhong et al., 2023; Wang et al., 2024f).
Dataset Splits No The paper mentions using well-known benchmarks such as Zs RE, Wiki Datarecent, Wiki Datacounterfact, Wiki Bio, and MQu AKE. While these benchmarks typically have predefined splits, the paper does not explicitly state the training/test/validation splits used, their percentages, or cite a specific split methodology for these datasets within the main text or appendices.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper mentions using "Easy Edit" as a framework but does not list specific software dependencies (e.g., programming languages, libraries, or other tools) with their version numbers.
Experiment Setup Yes FT-M used the following hyperparameters: On MQu AKE: Layers to tune: (20,21,22,23,24). Learning rate: 1e-3. Others unchanged. Lo RA used the following hyperparameters: On MQu AKE: Lo RA rank: 12. Iteration numbers: 50. Others unchanged. MELO used the following hyperparameters: We set initial radius for each code in the code-book to 60 for LLa MA 2, and 30 for LLa MA 3. In generation, we set temperature to 0.1. The maximum length was 30 for Single-Hop questions, and 200 for Multi-Hop questions.