Offset Unlearning for Large Language Models

Authors: James Y. Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, Muhao Chen

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that δUnlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks. We evaluate the effectiveness of δ-Unlearning on TOFU Maini et al. (2024), a widely used LLM unlearning benchmark containing knowledge about fictitious authors. Our experimental results on TOFU are shown in Tab. 2.
Researcher Affiliation Collaboration James Y. Huang EMAIL University of Southern California, Sheng Zhang EMAIL Microsoft Research, Muhao Chen EMAIL University of California, Davis
Pseudocode No No specific pseudocode or algorithm block is present. The methodology is described in text and mathematical formulas.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We conduct our experiments on TOFU Maini et al. (2024), a widely used LLM unlearning benchmark designed for evaluating LLMs. In addition to TOFU, we assess if the unlearned model preserves general utilities on well-established benchmarks, including ARC Clark et al. (2018), Hella Swag Zellers et al. (2019), Wino Grande Sakaguchi et al. (2021) and Open Book QA Mihaylov et al. (2018).
Dataset Splits No The paper defines different 'sets' for evaluation (Forget Set, Retain Set, Real Author, World Fact) from the TOFU benchmark, but does not provide specific training, validation, or test dataset splits or their percentages, or explicit methodology for creating these splits.
Hardware Specification Yes All models are trained using NVIDIA A100 GPUs for 5 epochs with a batch size of 32.
Software Dependencies No The paper mentions using specific Llama2 models (Llama2-13b-chat-hf and Llama2-7b-chat-hf) but does not provide explicit version numbers for programming languages or libraries used for implementation.
Experiment Setup Yes All models are trained using NVIDIA A100 GPUs for 5 epochs with a batch size of 32. We set α to 1 for our experiments. Following Yao et al. (2024), we match all models to the target ROUGE score by adjusting the learning rate.