Not All Wrong is Bad: Using Adversarial Examples for Unlearning

Authors: Ali Ebrahimpour-Boroojeny, Hari Sundaram, Varun Chandrasekaran

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using AMUN for unlearning a random 10% of CIFAR-10 samples, we observe that even SOTA membership inference attacks cannot do better than random guessing. As we will show in 6, AMUN outperforms prior state-of-the-art (SOTA) unlearning methods (Fan et al., 2024) in unlearning random subsets of the training data from a trained classification model and closes the gap with the retrained models, even when there is no access to the samples in D DF during the unlearning procedure.
Researcher Affiliation Academia 1University of Illinois at Urbana-Champaign (UIUC), Illinois, USA. Correspondence to: Ali Ebrahimpour-Boroojeny <EMAIL>.
Pseudocode Yes Algorithm 1 Build Adversarial Set (F, A, DF, ϵinit) 1: Input: Model F, Attack algorithm A, Forget set DF, and Initial ϵ for adversarial attack 2: Output: DA: Adversarial set for DF 3: DA = {} 4: for (x, y) in DF do 5: ϵ = ϵinit 6: while TRUE do 7: xadv = A(x, ϵ) 8: yadv = F(xadv) 9: if yadv ! = y then 10: Break 11: end if 12: ϵ = 2ϵ 13: end while 14: Add (xadv, yadv) to DA 15: end for 16: Return DA
Open Source Code Yes All code can be found in https://github.com/ Ali-E/AMUN
Open Datasets Yes Using AMUN for unlearning a random 10% of CIFAR-10 samples, we observe that even SOTA membership inference attacks cannot do better than random guessing. Setup: We consider the training set of CIFAR-10 as D and choose DF to be a random subset whose size is 10% of |D|. Similar experiment for VGG19 models trained on the Tiny Imagenet dataset (Le & Yang, 2015) can be found in In the leftmost sub-figure, the curve presented as Orig represents the test accuracy of the model when it is fine-tuned on D.
Dataset Splits Yes Using AMUN for unlearning a random 10% of CIFAR-10 samples, we observe that even SOTA membership inference attacks cannot do better than random guessing. Setup: We consider the training set of CIFAR-10 as D and choose DF to be a random subset whose size is 10% of |D|. In each of these two settings, we evaluate unlearning of 10% or 50% of the samples randomly chosen from D. Our goal is to unlearn 10% of the training samples (5K), but this time in 5 consecutive sets of size 2% (1K) each.
Hardware Specification Yes This work used Delta computing resources at National Center for Supercomputing Applications through allocation CIS240316 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program (Boerner et al., 2023), which is supported by U.S. National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
Software Dependencies No The paper does not explicitly provide specific version numbers for key software components or libraries used, such as Python, PyTorch, or CUDA.
Experiment Setup Yes Fig. 1 shows the fine-tuning of a trained Res Net-18 model for 20 epochs. For our experiments, we use PGD-50 (Madry, 2017) with ℓ2 norm bound as AF. We set the step size of the gradient ascent in the attack to 0.1 ϵ, which changes with the ϵ value. For tuning the hyper-parameters of the models, we followed the same range suggested by their authors and what has been used in the prior works for comparisons. Similar to prior works (Liu et al., 2024; Fan et al., 2024), we performed 10 epochs for each of the unlearning methods, and searched for best learning rate and number of steps for a learning rate scheduler. More specifically, for each unlearning method, we performed a grid search on learning rates within the range of [10 6, 10 1] with an optional scheduler that scales the learning rate by 0.1 for every 1 or 5 steps. For Sal Un, whether it is used on its own or in combination with AMUN, we searched for the masking ratios in the range [0.1, 0.9]. The original models are Res Net-18 models trained for 200 epochs with a learning rate initialized at 0.1 and using a scheduler that scales the learning rate by 0.1 every 40 epochs.