Textual Unlearning Gives a False Sense of Unlearning

Authors: Jiacheng Du, Zhibo Wang, Jie Zhang, Xiaoyi Pang, Jiahui Hu, Kui Ren

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive evaluations, our findings critically reveal that textual unlearning actually gives a false sense of unlearning! Existing methods for textual unlearning fail to completely erase unlearned texts, and their deployment will instead introduce heightened privacy risks. Specifically, we highlight the following key findings: Unlearned texts remain detectable with high confidence on the unlearned LMs. ... We present the results on Pythia-1.4b and Synth PAI-age dataset in Table 1, and results in other settings are provided in Appendix E.
Researcher Affiliation Academia 1The State Key Laboratory of Blockchain and Data Security, Zhejiang University, P. R. China 2School of Cyber Science and Technology, Zhejiang University, P. R. China 3ETH Zurich, Switzerland. Correspondence to: Zhibo Wang <EMAIL>.
Pseudocode Yes The detailed process of U-Li RA+ is outlined in Algorithm 1. To rigorously audit the unlearning algorithm U, we first sample an audit dataset Daudit from the training data distribution and mislabel it. ... Algorithm 2 TULA-MI in the Strict Case ... Algorithm 3 TULA-DR ... Algorithm 4 TULA-MI in the Relaxed Case
Open Source Code No The text does not contain an explicit statement about releasing the source code or a link to a repository for the methodology described in this paper.
Open Datasets Yes To ensure rigorous evaluation, we construct two synthetic datasets derived from the Synth PAI dataset (Yukhymenko et al., 2024), enabling fine-tuning on previously unseen data.
Dataset Splits No The paper mentions 'Train ACC and Test ACC' in Table 4, indicating train/test splits were used, but it does not specify the exact percentages, sample counts, or methodology for these splits for the main datasets (Synth PAI-age and Synth PAI-inc) used to fine-tune the LMs. It only details how audited samples are divided into unlearned and unseen parts.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory specifications) used to run the experiments.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Light GBM classifier' but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA) required for replication.
Experiment Setup Yes We utilize the Adam W optimizer (Loshchilov, 2017) to full-parameter fine-tune the model to obtain Moriginal, with the learning rate set to 1e-5, the batch size to 64, and the number of training rounds to 2. ... For TULA-DR, the learning rate α is set to 0.1 for the Pythia-1.4b model and 0.45 for the OPT-1.3b model, and the regularization factor β is fixed at 0.1.