WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
Authors: Lukas Thede, Karsten Roth, Matthias Bethge, Zeynep Akata, Thomas Hartvigsen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using Wiki Big Edit, we thoroughly analyze the capability of existing lifelong knowledge editing methods to conduct lifelong edits at scale; contrasted against retrieval augmentation and continual finetuning to understand limits in relation to other established approaches.1 |
| Researcher Affiliation | Academia | 1T ubingen AI Center, University of T ubingen 2Helmholtz Munich 3Munich Center for Machine Learning (MCML) 4Technical University, Munich 5University of Virginia. |
| Pseudocode | No | The paper describes methods and pipelines but does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code available at https://github.com/Explainable ML/Wiki Big Edit. |
| Open Datasets | Yes | We first introduce Wiki Big Edit; a large-scale benchmark of realworld Wikidata edits, built to automatically extend lifelong for future-proof benchmarking. In its first instance, it includes over 500K question-answer pairs for knowledge editing alongside a comprehensive evaluation pipeline. |
| Dataset Splits | Yes | These updates are grouped into B sequential batches [U1, U2, . . . , UB], where each batch can encompass anything from a single edit to multiple. For each batch update b (also denoted as timestep in this work), the model f b 1, trained on updates from prior batches U<b, is further updated with the current batch Ub to produce f b. ... SQA locality and SQA mhop are used for evaluation, while SQA changed constitutes the respective fact-based training data. |
| Hardware Specification | Yes | All experiments are performed on a compute cluster equipped with Nvidia A100 and H100 GPUs, leveraging Py Torch (Paszke et al., 2019) and building on the Easy Edit codebase (Zhang et al., 2024). |
| Software Dependencies | No | The paper mentions PyTorch and the Annoy solver but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Adapters are trained for 10 epochs on each timestep. ... Cosine learning rate scheduling with warmup is employed during training, with a fixed number of 10 epochs per batch. ... For the main experiments, k = 2 was chosen to enable effective multi-hop reasoning while keeping the context length manageable. ... After training of each timestep, current adapter weights are simply merged into preceding adapter weights using an interpolation weight of 0.25. |