ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

Authors: Jiaang Li, Quan Wang, Zhongnan Wang, Yongdong Zhang, Zhendong Mao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on GPT-2 XL and LLa MA2-7B demonstrate that ELDER effectively edits models in the lifelong setting, outperforming eight baselines while exhibiting strong scalability and preserving LLMs general abilities on downstream tasks.
Researcher Affiliation Academia 1 University of Science and Technology of China 2Beijing University of Posts and Telecommunications EMAIL, EMAIL EMAIL
Pseudocode Yes Algorithm 1: Inference with deferral mechanism.
Open Source Code No The paper does not contain an explicit statement about releasing code or a link to a code repository.
Open Datasets Yes Our experiments are conducted on two popular LLMs, i.e., GPT2-XL(Radford et al. 2019) and LLa MA2-7B (Touvron et al. 2023), with two widely used model editing datasets, Zs RE (Levy et al. 2017) and COUNTERFACT (Meng et al. 2022). ... Specifically, we employ a benchmark from (Gu et al. 2024), including eight diverse tasks: Reasoning on GSM8K (Cobbe et al. 2021), Natural Language Inference on RTE(Dagan, Glickman, and Magnini 2005), Open-domain QA on Natural Question(Kwiatkowski et al. 2019), Closed-domain QA on Bool Q(Clark et al. 2019), Dialogue on Mu Tual(Cui et al. 2020), Summarization on SAMSum (Cui et al. 2020), Named Entity Recognition on Co NLL03(Sang and De Meulder 2003), and Sentiment Analysis on SST2(Socher et al. 2013).
Dataset Splits No The paper states: 'We adopt both datasets to the lifelong model editing setting by extracting a sequence of 1000 editing samples with their rephrasings for our main experiments, following the methodologies outlined in (Hartvigsen et al. 2024) and (Yu et al. 2024)'. This indicates existing methodologies were followed for data preparation but does not provide specific split percentages or counts for training, validation, and test sets within the main text.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions the base LLMs (GPT2-XL, LLa MA2-7B).
Software Dependencies No The paper mentions several techniques and models (e.g., Lo RA, GPT2-XL, LLa MA2) and concepts (e.g., PyTorch implicitly for deep learning), but it does not specify version numbers for any software libraries or frameworks used in the implementation.
Experiment Setup Yes For our proposed ELDER across all settings, the rank of Lo RAs is set to 8, and the number of layers that apply mixture-of-Lo RA is set to 6. The number of Lo RAs per layer is set to 4, k is set to 2, and ϵ is set to 12. λ is set to 1e 2. More details of training and hyperparameter tuning are available in the technical appendix.