Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond

Authors: Qizhou Wang, Jin Zhou, (Andrew) Zhanke Zhou, Saebyeol Shin, Bo Han, Kilian Weinberger

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark both existing and new methods explored throughout our analysis on the wellestablished TOFU fictitious unlearning datasets (Maini et al., 2024). Our experiments identify several new state-of-the-arts that merit further attention. Additionally, based on our analysis, we highlight promising research directions that warrant exploration to further advance the field.
Researcher Affiliation Academia 1TMLR Group, Department of Computer Science, Hong Kong Baptist University 2Department of Computer Science, Cornell University
Pseudocode No The paper describes methods using mathematical formulations and textual descriptions, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes The code is publicly available at: https://github.com/tmlr-group/G-effect.
Open Datasets Yes We benchmark both existing and new methods explored throughout our analysis on the wellestablished TOFU fictitious unlearning datasets (Maini et al., 2024).
Dataset Splits Yes For the unlearning setups, the original TOFU data are separated into targeted and non-targeted parts, of which the adopted proportions are 1:99 (1% unlearning), 5:95 (5% unlearning), and 10:90 (10% unlearning). Moreover, we separate 400 non-targeted data that are not involved during the unlearning procedure for evaluations
Hardware Specification Yes Moreover, our experiments are conducted on computation nodes equipped with NVIDIA-A100-80GB GPUs and Intel(R) Xeon(R) Gold 6248R CPUs.
Software Dependencies Yes The systems utilize Transformers version 4.42.4 and CUDA version 12.1.
Experiment Setup Yes We default to apply the following settings: the Adam W optimizer (Loshchilov & Hutter, 2017), a batch size of 16, a maximal gradient norm of 1, and the (un)learning rate of 2e 5 for Phi-1.5 and 1e 5 for Llama-2-7b with linear warm-up for the first epoch. Each method is executed over a total of 5 epochs. Moreover, the model-specific hyper-parameters after fine-tuning are as follows: For the 1% and 5% setups, we set α = 5 for WGA; β = 0.5 for NPO; β = 4 for TNPO; α = 1.5 and β = 5 for WTNPO. For the 10% setup, we set α = 7 for WGA; β = 0.5 for NPO; β = 5 for TNPO; α = 1.5 and β = 7 for WTNPO. For RMU, we set the 9-th layer with c = 4 for Phi-1.5 and the 21-th layer with c = 2 for Llama-2-7B.