WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs

Authors: Lukas Thede, Karsten Roth, Matthias Bethge, Zeynep Akata, Thomas Hartvigsen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using Wiki Big Edit, we thoroughly analyze the capability of existing lifelong knowledge editing methods to conduct lifelong edits at scale; contrasted against retrieval augmentation and continual finetuning to understand limits in relation to other established approaches.1
Researcher Affiliation Academia 1T ubingen AI Center, University of T ubingen 2Helmholtz Munich 3Munich Center for Machine Learning (MCML) 4Technical University, Munich 5University of Virginia.
Pseudocode No The paper describes methods and pipelines but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes 1Code available at https://github.com/Explainable ML/Wiki Big Edit.
Open Datasets Yes We first introduce Wiki Big Edit; a large-scale benchmark of realworld Wikidata edits, built to automatically extend lifelong for future-proof benchmarking. In its first instance, it includes over 500K question-answer pairs for knowledge editing alongside a comprehensive evaluation pipeline.
Dataset Splits Yes These updates are grouped into B sequential batches [U1, U2, . . . , UB], where each batch can encompass anything from a single edit to multiple. For each batch update b (also denoted as timestep in this work), the model f b 1, trained on updates from prior batches U<b, is further updated with the current batch Ub to produce f b. ... SQA locality and SQA mhop are used for evaluation, while SQA changed constitutes the respective fact-based training data.
Hardware Specification Yes All experiments are performed on a compute cluster equipped with Nvidia A100 and H100 GPUs, leveraging Py Torch (Paszke et al., 2019) and building on the Easy Edit codebase (Zhang et al., 2024).
Software Dependencies No The paper mentions PyTorch and the Annoy solver but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Adapters are trained for 10 epochs on each timestep. ... Cosine learning rate scheduling with warmup is employed during training, with a fixed number of 10 epochs per batch. ... For the main experiments, k = 2 was chosen to enable effective multi-hop reasoning while keeping the context length manageable. ... After training of each timestep, current adapter weights are simply merged into preceding adapter weights using an interpolation weight of 0.25.