Catastrophic Failure of LLM Unlearning via Quantization
Authors: Zhiwei Zhang, Fali Wang, Xiaomin Li, Zongyu Wu, Xianfeng Tang, Hui Liu, Qi He, Wenpeng Yin, Suhang Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments using various quantization techniques across multiple precision levels to thoroughly evaluate this phenomenon. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision, which significantly increases to 83% after 4-bit quantization. |
| Researcher Affiliation | Collaboration | Zhiwei Zhang1, Fali Wang1, Xiaomin Li2, Zongyu Wu1, Xianfeng Tang3, Hui Liu3, Qi He3, Wenpeng Yin1, Suhang Wang1 1The Pennsylvania State University 2Harvard University 3Amazon EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper includes mathematical equations (Equation 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13) and conceptual diagrams (Figure 1, 2) but does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at: https://github.com/zzwjames/Failure LLMUnlearning. |
| Open Datasets | Yes | We conduct experiments on MUSE (Shi et al., 2024b), a benchmark for evaluating machine unlearning in language models, using two datasets: NEWS and BOOKS. The NEWS dataset (Li et al., 2023b) includes recent BBC news articles... The BOOKS dataset (Eldan & Russinovich, 2023) features the Harry Potter series, with original novels as the forget set and related Fan Wiki materials as the retain set... |
| Dataset Splits | No | The NEWS dataset (Li et al., 2023b) includes recent BBC news articles divided into forget, retain, and holdout sets. The BOOKS dataset (Eldan & Russinovich, 2023) features the Harry Potter series, with original novels as the forget set and related Fan Wiki materials as the retain set to preserve domain knowledge post-unlearning. |
| Hardware Specification | No | The paper mentions models like LLaMA-2 7B and ICLM-7B and states they were fine-tuned, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for these experiments. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer (Loshchilov et al., 2017) but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or other key dependencies used in the implementation. |
| Experiment Setup | Yes | Following the experimental setup in (Shi et al., 2024b), we implement six unlearning methods: GA, GA_GDR, GA_KLR, NPO, NPO_GDR, and NPO_KLR, using the Adam W optimizer (Loshchilov et al., 2017) with a fixed learning rate of 1e 5. We conduct the experiments over 10 and 5 epochs for the NEWS and BOOKS datasets, respectively. A grid search across {2, 5, 10, 100, 300} determines the optimal weight α for the utility constraint on the retain dataset to balance unlearning performance with model utility. Table 4 shows the weight for regularization on the retain dataset for each method. The detailed hyperparameter selection for the unlearning methods incorporating SURE is presented in Table 5. |