Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

Authors: Phillip Huang Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, Gintare Karolina Dziugaite

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a rigorous evaluation of several standard editing approaches on factual recall tasks, and we identify mechanisms for factual lookup and attribute extraction on Gemma-7B, Gemma-2-9B, and Llama-3-8B. We demonstrate that gradient-based editing localized on the factual lookup mechanism is more robust than OT localizations and baselines across multiple datasets, models, and evaluations.Our experiments are designed to test the effectiveness of localization for editing of facts.
Researcher Affiliation Collaboration 1Work done while at University of Maryland, College Park 2University of Maryland, College Park 3Georgia Institute of Technology 4University of Bristol 5Google Deep Mind. Correspondence to: Aaquib Syed <EMAIL>, Phillip Guo <EMAIL>, Gintare Karolina Dziugaite <EMAIL>.
Pseudocode No No explicit pseudocode or algorithm blocks are present in the paper. Methods are described in prose.
Open Source Code No The paper does not contain an explicit statement regarding the release of their source code or a link to a code repository for the methodology described.
Open Datasets Yes We focus on editing subsets of two datasets: (1) Sports Facts dataset from Nanda et al. (2023), which contains subject-sport relations across three sports categories for 1567 athletes, and (2) the Counter Fact dataset from Meng et al. (2023).
Dataset Splits Yes To increase the comprehensiveness of our evaluation, we run experiments with different forget set sizes: 16 athletes and 64 athletes. We replicate the methodology of Deeb & Roger (2024), splitting our forget sets in two independent halves, retraining with half of the ground truth labels, and evaluating on the other half.
Hardware Specification No For Gemma-2-9b, we are forced to use an 8-bit optimizer to fit our training in the memory of 1 GPU. This states the quantity of GPUs but not specific models or other detailed specifications required for reproducibility.
Software Dependencies No We fine tune using... an Adam W optimizer (Kingma & Ba, 2017) with 0 weight decay and a cosine annealing scheduler. For Gemma-2-9b, we are forced to use an 8-bit optimizer to fit our training in the memory of 1 GPU. While it mentions Adam W optimizer and 8-bit optimizer, it does not specify versions for any key software components or libraries (e.g., PyTorch, TensorFlow, Python version) that would enable reproducible environment setup.
Experiment Setup Yes Across all tasks except Sequential-Counter Fact-Editing and all models, we fine tune using 50 iterations of batch size 4 with 16 accumulation steps, using an Adam W optimizer (Kingma & Ba, 2017) with 0 weight decay and a cosine annealing scheduler. Table 6 has all learning rates used and Table 7 has all injection loss coefficients used.