reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

Authors: Phillip Huang Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, Gintare Karolina Dziugaite

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a rigorous evaluation of several standard editing approaches on factual recall tasks, and we identify mechanisms for factual lookup and attribute extraction on Gemma-7B, Gemma-2-9B, and Llama-3-8B. We demonstrate that gradient-based editing localized on the factual lookup mechanism is more robust than OT localizations and baselines across multiple datasets, models, and evaluations.Our experiments are designed to test the effectiveness of localization for editing of facts.
Researcher Affiliation	Collaboration	1Work done while at University of Maryland, College Park 2University of Maryland, College Park 3Georgia Institute of Technology 4University of Bristol 5Google Deep Mind. Correspondence to: Aaquib Syed <EMAIL>, Phillip Guo <EMAIL>, Gintare Karolina Dziugaite <EMAIL>.
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper. Methods are described in prose.
Open Source Code	No	The paper does not contain an explicit statement regarding the release of their source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We focus on editing subsets of two datasets: (1) Sports Facts dataset from Nanda et al. (2023), which contains subject-sport relations across three sports categories for 1567 athletes, and (2) the Counter Fact dataset from Meng et al. (2023).
Dataset Splits	Yes	To increase the comprehensiveness of our evaluation, we run experiments with different forget set sizes: 16 athletes and 64 athletes. We replicate the methodology of Deeb & Roger (2024), splitting our forget sets in two independent halves, retraining with half of the ground truth labels, and evaluating on the other half.
Hardware Specification	No	For Gemma-2-9b, we are forced to use an 8-bit optimizer to fit our training in the memory of 1 GPU. This states the quantity of GPUs but not specific models or other detailed specifications required for reproducibility.
Software Dependencies	No	We fine tune using... an Adam W optimizer (Kingma & Ba, 2017) with 0 weight decay and a cosine annealing scheduler. For Gemma-2-9b, we are forced to use an 8-bit optimizer to fit our training in the memory of 1 GPU. While it mentions Adam W optimizer and 8-bit optimizer, it does not specify versions for any key software components or libraries (e.g., PyTorch, TensorFlow, Python version) that would enable reproducible environment setup.
Experiment Setup	Yes	Across all tasks except Sequential-Counter Fact-Editing and all models, we fine tune using 50 iterations of batch size 4 with 16 accumulation steps, using an Adam W optimizer (Kingma & Ba, 2017) with 0 weight decay and a cosine annealing scheduler. Table 6 has all learning rates used and Table 7 has all injection loss coefficients used.