Hessian-Free Online Certified Unlearning
Authors: Xinbao Qiao, Meng Zhang, Ming Tang, Ermin Wei
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our proposed scheme surpasses existing results by orders of magnitude in terms of time/storage costs with millisecond-level unlearning execution, while also enhancing test accuracy. We conduct experimental evaluations using a wider range of metrics compared to previous theoretical second-order studies, and release our open source code.1 The experimental results verify our theoretical analysis and demonstrate that our proposed approach surpasses previous certified unlearning works. In particular, our algorithm incurs millisecond-level unlearning runtime to forget per sample with minimal performance degradation. |
| Researcher Affiliation | Academia | 1Zhejiang University 2Southern University of Science and Technology 3Northwestern University EMAIL |
| Pseudocode | Yes | Algorithm 1: Hessian-Free Online Unlearning (HF) Algorithm |
| Open Source Code | Yes | We conduct experimental evaluations using a wider range of metrics compared to previous theoretical second-order studies, and release our open source code.1 The experimental results verify our theoretical analysis and demonstrate that our proposed approach surpasses previous certified unlearning works. |
| Open Datasets | Yes | We conduct experiments in both convex and non-convex scenarios. Specifically, we trained a multinomial Logistic Regression (LR) with total parameters d = 7850 and a simple convolutional neural network (CNN) with total parameters d = 21840 on MNIST dataset (Deng (2012)) for handwriting digit classification. We further evaluate using larger-scale model Res Net-18 (He et al. (2016)) which features 11M parameters with three datasets: CIFAR-10 (Alex (2009)) for image classification, Celeb A (Liu et al. (2015)) for gender prediction, and LFWPeople (Huang et al. (2007)) for face recognition across 29 different individuals. |
| Dataset Splits | Yes | We train LR and CNN on MNIST with 1,000 training data 20% data points to be forgotten, which have setups identical to the aforementioned verification experiments I. We further evaluate on FMNIST with 4,000 training data and 20% data points to be forgotten using CNN and Le Net with a total of 61,706 parameters. We conducted evaluation on Res Net-18 trained on CIFAR-10 with 50,000 samples. We conducted evaluation on Res Net-18 trained on LFW with 984 samples, for the classification of 29 facial categories. We conducted evaluation on Res Net-18 trained on Celeb A with 10,000 samples. |
| Hardware Specification | Yes | The experiments were conducted on the NVIDIA Ge Force RTX 4090. Our comprehensive tests were conducted on AMD EPYC 7763 CPU @1.50GHz with 64 cores under Ubuntu20.04.6 LTS. |
| Software Dependencies | Yes | The code were implemented in Py Torch 2.0.0 and leverage the CUDA Toolkit version 11.8. |
| Experiment Setup | Yes | For LR, training was performed for 15 epochs with a stepsize of 0.05 and a batch of 32. For CNN, training was carried out for 20 epochs with a stepsize of 0.05 and a batch size of 64. Given these configurations, we separately assess the distance and correlation between approximators a U HF , a U NS , a U IJ at deletion rates in the set {1%, 5%, 10%, 15%, 20%, 25%, 30%}. Following the suggestion in Basu et al. (2021), a damping factor of 0.01 is added to the Hessian to ensure its invertibility when implementing NS and IJ. |