MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Authors: Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah Smith, Chiyuan Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models... Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. |
| Researcher Affiliation | Collaboration | 1University of Washington 2Princeton University 3University of Southern California 4University of Chicago 5Google Research |
| Pseudocode | No | The paper describes unlearning methods like Gradient Ascent and Negative Preference Optimization (NPO) and provides the mathematical objective for NPO, but it does not present these or any other procedures in a structured pseudocode block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | Code We will provide the code for all baseline methods, evaluation scripts used for benchmarking, as well as the code for visualizations and analysis presented in this paper. |
| Open Datasets | Yes | NEWS consists of BBC news articles (Li et al., 2023b) collected after August 2023. All articles are randomly divided into (disjoint) forget, retain, and holdout sets. BOOKS consists of the Harry Potter book series. To simulate a real-world setting for testing utility preservation (C4), we include different types of materials in the forget and retain sets. The forget set contains the original books, while the retain set contains related content from the Harry Potter Fan Wiki, harrypotter.fandom.com/wiki |
| Dataset Splits | Yes | All articles are randomly divided into (disjoint) forget, retain, and holdout sets. The sizes of the forget and retain sets are reported in tokens in (). Note that only the Verbatim texts within the Forget Set are included in our training data, while all Knowledge sets (QA pairs) serve for evaluations. ... To simulate sequential unlearning, we partition the extended NEWS forget set (comprised of 3.3M tokens) into four disjoint folds (each containing 0.8M tokens) and apply the unlearning methods to each fold in a sequential manner. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA A40 GPU cards in a single node. |
| Software Dependencies | No | The paper mentions specific models like 'LLa MA-2 7B' and 'ICLM-7B' and an optimizer 'Adam W optimizer' but does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Following prior work (Maini et al., 2024), we run GA, NPO, and their regularized variants using the Adam W optimizer (Loshchilov & Hutter, 2017) with a constant learning rate of 10 5 and a batch size of 32. We employ the stopping criteria as follows: if the utility (i.e., Know Mem on Dretain) of a model undergoing unlearning drops below that of fretrain within 10 epochs of unlearning, we stop at the first epoch where this condition holds; otherwise, we take a checkpoint from the 10th epoch. For Task Vector and WHP, to obtain the reinforced model for unlearning, we fine-tune the target model for 10 epochs using the same learning rate and batch size. |