reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Authors: Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Jie Shi, Xiang Wang, Xiangnan He, Tat-Seng Chua

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the effectiveness of our method, we conducted extensive experiments on multiple representative LLMs, such as GPT-2 XL (Radford et al., 2019) and LLa MA-3 (8B). The results show that, compared to the best-performing baseline, Alpha Edit can achieve an average performance improvement of 36.7% by adding just one line of code to the conventional model editing method, MEMIT (Meng et al., 2023), as illustrated in Figure 2. Furthermore, we empirically verified that this simple idea can be easily applied to most existing model editing methods (Meng et al., 2022; 2023; Ma et al., 2025; Gu et al., 2024; Li et al., 2024b), functioning as a plug-and-play enhancement that significantly boosts their performance. This highlights Alpha Edit s crucial role in efficient knowledge updates for LLMs, enabling broader applications and future advancements in the field.
Researcher Affiliation	Academia	Junfeng Fang1 , Houcheng Jiang1 , Kun Wang1, Yunshan Ma2, Jie Shi2, Xiang Wang1 , Xiangnan He1 , Tat-Seng Chua2 1University of Science and Technology of China, 2National University of Singapore EMAIL, EMAIL
Pseudocode	No	The paper describes methods and derivations using mathematical equations (Eqn. 1-6, 8-15) and textual steps, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Our code is available at: https://github.com/ jianghoucheng/Alpha Edit.
Open Datasets	Yes	We evaluate Alpha Edit using two widely adopted benchmarks: the Counterfact dataset (Meng et al., 2022) and the Zs RE dataset (Levy et al., 2017). ... In addition, for comprehensive evaluation, Appendix C.7 presents experiments conducted on three additional datasets: Longform Evaluation (Rosati et al., 2024), MQUAKE (Zhong et al., 2023), and Know Edit (Zhang et al., 2024d).
Dataset Splits	Yes	Table 1 presents the results under a commonly used configuration for the sequential editing task, where 2,000 samples are randomly drawn from the dataset for updates, with 100 samples per edit (i.e., a batch size of 100). For additional experimental results, such as case studies of model outputs after editing, please refer to Appendix C.
Hardware Specification	Yes	All experiments are conducted on a single A40 (48GB) GPU.
Software Dependencies	No	The LLMs are loaded using Hugging Face Transformers (Wolf et al., 2019). While Hugging Face Transformers is mentioned, a specific version number for the library is not provided.
Experiment Setup	Yes	For the GPT-2 XL model, we target critical layers [13, 14, 15, 16, 17] for editing, with the hyperparameter λ set to 20,000. During the computation of hidden representations of the critical layer, we perform 20 optimization steps with a learning rate of 0.5. For the GPT-J model, we target critical layers [3, 4, 5, 6, 7, 8] for editing, with the hyperparameter λ set to 15,000. During the computation of hidden representations of the critical layer, we perform 25 optimization steps, also with a learning rate of 0.5. For Llama3 (8B) model, we target critical layers [4, 5, 6, 7, 8] for editing. The hyperparameter λ is set to 15,000. During the process of computing hidden representations of the critical layer, we perform 25 steps with a learning rate of 0.1.