Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning

Authors: Puning Yang, Qizhou Wang, Zhuo Huang, Tongliang Liu, Chengqi Zhang, Bo Han

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. To study their usefulness, we design specific reweighting strategies for each goal and evaluate their respective effects on unlearning. We conduct extensive empirical analyses on wellestablished benchmarks, and summarize some important observations as follows: (i) Saturation enhances efficacy more than importance-based reweighting, and their combination can yield additional improvements. (ii) Saturation typically allocates lower weights to data with lower likelihoods, whereas importance-based reweighting does the opposite. (iii) The efficacy of unlearning is also largely influenced by the smoothness and granularity of the weight distributions. Based on these findings, we propose Sat Imp, a simple reweighting method that combines the advantages of both saturation and importance. Empirical results on extensive datasets validate the efficacy of our method, potentially bridging existing research gaps and indicating directions for future research. Our code is available at https://github.com/tmlrgroup/Sat Imp.
Researcher Affiliation Academia 1TMLR Group, Department of Computer Science, Hong Kong Baptist University. 2Sydney AI Centre, The University of Sydney. 3Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University. Correspondence to: Bo Han <EMAIL>.
Pseudocode Yes Algorithm 1 Hard Sampling 1: Input: token-wise likelihood L = {ℓk} for a data point (x, y), sampling strategy, fraction β 2: Output: weight w Top K, w Bottom K, w Random; 3: Get sampling size s β len(y); 4: Initialize w = {wi = 0 | i = 1, 2, ..., y}; 5: Sort likelihoods L sort Ascending(L) 6: if strategy is Top K then 7: w Top K k = 1, k L[0 : s]; 8: else if strategy is Bottom K then 9: w Bottom K k = 1, k L[len(y) s : len(y)]; 10: else if strategy is Random then 11: w Random k = 1, k random Sampling(y, s); 12: end if
Open Source Code Yes Our code is available at https://github.com/tmlrgroup/Sat Imp.
Open Datasets Yes Configurations. We conduct our experiments on the wellestablished TOFU benchmark (Maini et al., 2024), comprising synthetic authors profiles to test the efficacy of various unlearning methods in addressing privacy concerns. This benchmark includes three unlearning settings, aiming to remove 1%, 5%, and 10% of the total dataset. We employ the pre-trained models LLa MA-2-7B (Touvron et al., 2023) and Phi-1.5 (Li et al., 2023b). For performance assessment, we default to report the ES scores (Wang et al., 2025a), which have been shown to be more reliable than others, such as FQ and MU proposed by the TOFU benchmark. Generally, we prefer smaller ES scores for targeted data and larger values for non-targeted data, reflecting model ability of removal and retention, respectively. In our experiments, we default to setting the hyper-parameters of p = 0.3 and τ = 1. Moreover, we explore other benchmarks such as WMDP (Li et al., 2024) and MUSE (Shi et al., 2024), and employ additional metrics to further solidify our analyses.
Dataset Splits Yes The TOFU dataset comprises 4,000 question-answer pairs about 200 fictional authors, with each author assigned 20 QA pairs. Regarding specific unlearn tasks, TOFU establishes three settings: 1% (2 authors, 40 QApairs), 5% (10 authors, 200 QA pairs), and 10% (20 authors, 400 QA pairs). TOFU Authors provide fine-tuned models of Llama2-7B and Phi-1.5 on the full TOFU dataset. Additionally, the authors provide fine-tuned results of Llama2-7B and Phi-1.5 models on the Retain data to support subsequent performance evaluation.
Hardware Specification Yes Environmental Configurations. Our experiments are completed on 8 NVIDIA A100 GPUs, with Python 3.10.14 and Py Torch 2.1.0.
Software Dependencies Yes Environmental Configurations. Our experiments are completed on 8 NVIDIA A100 GPUs, with Python 3.10.14 and Py Torch 2.1.0.
Experiment Setup Yes TOFU experiment setup. Following the common practice on TOFU benchmark, we utilize a linear warm-up learning rate during the first epoch, followed by a linearly decaying learning rate in the remaining epochs. Before the unlearning process, we fine-tune the LLa MA-2-7B model on the TOFU full set for 5 epochs with a batch size of 32 and a learning rete 1e 5 to obtain the original model. Phi-1.5 is also fine-tuned with a differnt learning rate 2e 5. For three unlearning settings: Forget01, Forget05, and Forget10, we train both LLa MA-2-7B and Phi-1.5 for 5 epochs with a batch size of 16. The learning rates for LLa MA-2-7B and Phi-1.5 are set to 1e 5 and 2e 5, respectively. WMDP experiment setup. We utilize the official provided Zephyr-7B-beta model in the WMDP benchmark. The total training step is set to 125 steps for all unlearning objectives. The learning rate is set to 4e 6, which is also a linear warm-up learning rate during the first 25 step, followed by a linearly decaying learning rate in the remaining steps. The batch size is set to 16. MUSE experiment setup. We utilize the official provided LLa MA-2-7B for the News task, and the ICLM-7B for the Books task. Each model is trained for 10 epochs with a constant learning rate 1e 5 and a batch size of 16. Results are selected from 10 checkpoints saved at each epoch.