reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Catastrophic Failure of LLM Unlearning via Quantization

Authors: Zhiwei Zhang, Fali Wang, Xiaomin Li, Zongyu Wu, Xianfeng Tang, Hui Liu, Qi He, Wenpeng Yin, Suhang Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments using various quantization techniques across multiple precision levels to thoroughly evaluate this phenomenon. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision, which significantly increases to 83% after 4-bit quantization.
Researcher Affiliation	Collaboration	Zhiwei Zhang1, Fali Wang1, Xiaomin Li2, Zongyu Wu1, Xianfeng Tang3, Hui Liu3, Qi He3, Wenpeng Yin1, Suhang Wang1 1The Pennsylvania State University 2Harvard University 3Amazon EMAIL EMAIL, EMAIL
Pseudocode	No	The paper includes mathematical equations (Equation 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13) and conceptual diagrams (Figure 1, 2) but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at: https://github.com/zzwjames/Failure LLMUnlearning.
Open Datasets	Yes	We conduct experiments on MUSE (Shi et al., 2024b), a benchmark for evaluating machine unlearning in language models, using two datasets: NEWS and BOOKS. The NEWS dataset (Li et al., 2023b) includes recent BBC news articles... The BOOKS dataset (Eldan & Russinovich, 2023) features the Harry Potter series, with original novels as the forget set and related Fan Wiki materials as the retain set...
Dataset Splits	No	The NEWS dataset (Li et al., 2023b) includes recent BBC news articles divided into forget, retain, and holdout sets. The BOOKS dataset (Eldan & Russinovich, 2023) features the Harry Potter series, with original novels as the forget set and related Fan Wiki materials as the retain set to preserve domain knowledge post-unlearning.
Hardware Specification	No	The paper mentions models like LLaMA-2 7B and ICLM-7B and states they were fine-tuned, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for these experiments.
Software Dependencies	No	The paper mentions using the Adam W optimizer (Loshchilov et al., 2017) but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or other key dependencies used in the implementation.
Experiment Setup	Yes	Following the experimental setup in (Shi et al., 2024b), we implement six unlearning methods: GA, GA_GDR, GA_KLR, NPO, NPO_GDR, and NPO_KLR, using the Adam W optimizer (Loshchilov et al., 2017) with a fixed learning rate of 1e 5. We conduct the experiments over 10 and 5 epochs for the NEWS and BOOKS datasets, respectively. A grid search across {2, 5, 10, 100, 300} determines the optimal weight α for the utility constraint on the retain dataset to balance unlearning performance with model utility. Table 4 shows the weight for regularization on the retain dataset for each method. The detailed hyperparameter selection for the unlearning methods incorporating SURE is presented in Table 5.