reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Perturbation-Restrained Sequential Model Editing

Authors: Jun-Yu Ma, Hong Wang, Hao-Xiang Xu, Zhen-Hua Ling, Jia-Chen Gu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Systematically, we conduct experiments employing three editing methods on three LLMs across four downstream tasks. The results show that PRUNE can preserve general abilities while maintaining the editing performance effectively in sequential model editing.
Researcher Affiliation	Academia	Jun-Yu Ma1,2, Hong Wang1, Hao-Xiang Xu1,2, Zhen-Hua Ling1,2, Jia-Chen Gu3 1University of Science and Technology of China 2National Engineering Research Center of Speech and Language Information Processing 3University of California, Los Angeles
Pseudocode	No	The paper describes methods and formulas but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	The code are available at https://github.com/mjy1111/PRUNE.
Open Datasets	Yes	For factual knowledge, two popular model editing datasets Zero-Shot Relation Extraction (ZSRE) (Levy et al., 2017) and COUNTERFACT (Meng et al., 2022) were adopted in our experiments. ... For conceptual knowledge, the Concept Edit dataset (Wang et al., 2024) was adopted. ... Reasoning on the GSM8K (Cobbe et al., 2021), Summarization on the SAMSum (Gliwa et al., 2019), Open-domain QA on the Natural Question (Kwiatkowski et al., 2019), and Natural language inference (NLI) on the RTE (Dagan et al., 2005).
Dataset Splits	No	For each dataset, some examples were randomly sampled for evaluation. Details of prompts for each task were shown in Appendix B.4.
Hardware Specification	Yes	We used NVIDIA A800 80GB GPU for experiments.
Software Dependencies	No	The paper mentions using a framework called Easy Edit (Wang et al., 2023) and various LLMs (GPT-2 XL, LLaMA-2, LLaMA-3) but does not provide specific version numbers for programming languages, libraries, or other software dependencies crucial for replication.
Experiment Setup	Yes	When conducting experiments, for different editing methods, LLMs and editing datasets, the hyperparameter α in function F of PRUNE is different. Table 4 shows the details of this hyperparameter.