LLM Unlearning via Loss Adjustment with Only Forget Data

Authors: Yaxuan Wang, Jiaheng Wei, Yuhao Liu, Jinlong Pang, Quan Liu, Ankit Parag Shah, Yujia Bao, Yang Liu, Wei Wei

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our approach not only achieves superior unlearning performance compared to existing methods but also minimizes the impact on the model s retained capabilities, ensuring high utility across diverse tasks , including copyrighted content unlearning on Harry Potter dataset and MUSE Benchmark, and entity unlearning on the TOFU dataset 1.
Researcher Affiliation Collaboration 1University of California Santa Cruz 2Center for Advanced AI, Accenture 3The Hong Kong University of Science and Technology (Guangzhou)
Pseudocode No The paper describes steps like 'Step 1: Equip example/template responses ye for each forget sample xf.' and 'Step 2: Loss adjustmens w.r.t. the sample pairs (xf, ye, yf)'. These are described in paragraph form and not formatted as structured pseudocode or an algorithm block.
Open Source Code Yes 1The code is available at https://github.com/UCSC-REAL/FLAT.
Open Datasets Yes including copyrighted content unlearning on Harry Potter dataset and MUSE Benchmark, and entity unlearning on the TOFU dataset 1. ... Harry Potter and the Sorcerer s Stone (Rowling, 1997) ... C4 dataset (Raffel et al., 2020) ... Wikitext (Merity et al., 2016). ... Bool Q (Clark et al., 2019), RTE (Dagan et al., 2005), Hella Swag (Zellers et al., 2019), Winogrande (Sakaguchi et al., 2021), ARC-Challenge (Chollet, 2019), ARC-Easy (Chollet, 2019), Open Book QA (Mihaylov et al., 2018), Piqa (Bisk et al., 2020), and Truthful QA (Lin et al., 2021). ... News consists of BBC news articles (Li et al., 2023b)
Dataset Splits Yes We extract 400 chunks from the Harry Potter book series dataset (Eldan & Russinovich, 2023), with each chunk containing up to 512 tokens, to create the forget dataset Df. We sample 400 paragraphs in the C4 dataset (Raffel et al., 2020) as the retain data Dr. ... The TOFU dataset (Maini et al., 2024a) is a synthetic question-answering dataset focused on author biographies, aiming to enable a LLM to unlearn a portion of fictitious authors while retaining knowledge about the rest and real-world facts. The dataset includes 200 fake authors, each with 20 QA pairs, and experiments are conducted with 1%, 5% or 10% of these authors marked for unlearning. ... All articles are randomly divided into forget, retain, and holdout sets.
Hardware Specification No The paper mentions using specific LLM models like OPT-2.7B, Llama2-7B, and Phi-1.5B for experiments but does not specify the underlying hardware (e.g., GPU models, CPU types) on which these models were trained or fine-tuned.
Software Dependencies No The paper mentions "Adam W serves as the optimizer" and "LM Evaluation Harness (Gao et al., 2023)" but does not provide specific version numbers for these or other software components like programming languages or libraries.
Experiment Setup Yes The finetuning procedure for the OPT-2.7B and Llama2-7B models involve a learning rate of 1e-5 and a batch size of 2. Adam W serves as the optimizer for preparing these models. For baseline methods, we set the batch size and learning rate to be the same as in their original papers, and fine-tune for 5 epochs using Adam W optimizer. For our method, we use the same training hyper-parameters as baseline but set the learning rate to be 2e-7. ... For all LLM unlearning methods, we set the batch size to be 32 following previous works ... For Phi-1.5B, we fine-tune the pre-trained models for 5 epochs using learning rate of 2e-5 to obtain the original model. Similarly, we fine-tune Llama2-7B and OPT-2.7B for the same duration with a learning rate of 1e-5.