Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Authors: Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong, Georg Gottlob
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper explores Machine Unlearning (MU), an emerging field that is gaining increased attention due to concerns about neural models unintentionally remembering personal or sensitive information. We present SEUL, a novel method that enables selective and fine-grained unlearning for language models. Furthermore, we introduce two innovative evaluation metrics, sensitive extraction likelihood (S-EL) and sensitive memorization accuracy (S-MA), specifically designed to assess the effectiveness of forgetting sensitive information. In support of the unlearning framework, we propose efficient automatic online and offline sensitive span annotation methods. The paper also includes sections titled "Experimental Setup" and "Experimental Results" discussing evaluations on datasets and comparisons to baselines. |
| Researcher Affiliation | Collaboration | Lingzhi Wang 1Harbin Institute of Technology, Shenzhen, China; Xingshan Zeng2Huawei Noah s Ark Lab, China; Jinsong Guo3Unlimidata Ltd, United Kingdom; Kam-Fai Wong4The Chinese University of Hong Kong, China; Georg Gottlob6University of Calabria, Italy. The affiliations include both academic institutions (Harbin Institute of Technology, The Chinese University of Hong Kong, University of Calabria) and industry labs/companies (Huawei Noah s Ark Lab, Unlimidata Ltd). |
| Pseudocode | No | The paper describes methods like online selection and offline annotation in prose and provides mathematical formulations (e.g., Equation 1 and 2), but it does not include any explicitly labeled pseudocode blocks or algorithms with structured steps formatted like code. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available. Phrases like "We release our code..." or direct repository links are absent. |
| Open Datasets | Yes | The forget set is sourced from the Training Data Extraction Challenge (https://github.com/google-research/lm-extraction-benchmark). To evaluate general language modeling capabilities, we employ 8 classification tasks (i.e., Hellaswag (Zellers et al. 2019), Winogrande (Sakaguchi et al. 2021), and COPA (Gordon, Kozareva, and Roemmele 2012), and ARC-Easy (Clark et al. 2018), ARC-Challenge (Clark et al. 2018), Piqa (Bisk et al. 2020), Math QA (Amini et al. 2019), Pubmed QA (Jin et al. 2019) benchmarks) and 4 dialogue tasks (Wizard of Wikipedia (Dinan et al. 2019), Empathetic Dialogues (Rashkin et al. 2019), Blended Skill Talk (Smith et al. 2020), and Wizard of Internet (Komeili, Shuster, and Weston 2022)). |
| Dataset Splits | No | The paper mentions using a "forget dataset Df" and a "test set Dt" for evaluation, and specifies the forget set comprises 15,000 examples. It lists various benchmark datasets used for classification and dialogue tasks. However, it does not provide explicit details on how these datasets were split into training, validation, and test sets, either by percentages, sample counts, or references to specific predefined splits used for their experiments beyond mentioning the datasets themselves. |
| Hardware Specification | Yes | All the models are trained with a single Nvidia Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions the use of pre-trained language models like GPT-Neo series (125M, 1.3B, and 2.7B), Llama2-7B and Mistral-7B, but does not specify versions for any ancillary software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) used for the implementation. |
| Experiment Setup | Yes | The learning rate for training is set to 5 10 5, based on the selection from [2 10 5, 5 10 5, 1 10 4]. The variable denoting the number of forgetting instances, represented as d, is examined across the values d = 1, 2, 4, 8, 16, 32, 64, 128. Unless otherwise specified, the reported results in this paper are based on the d = 32 setting. We adapt the global batch size during training to be the same as d, the number of forgetting instances, following Jang et al. (2023). Each setting is run 5 times and the reported results are the average of 5 different runs. |