Fast Exact Unlearning for In-Context Learning Data for LLMs

Authors: Andrei Ioan Muresanu, Anvith Thudi, Michael R. Zhang, Nicolas Papernot

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluations explored how it compared to existing exact unlearning baselines. We conducted experiments across Big-Bench Instruction Induction (BBII) tasks, and compared performance of ERASE to variants of SISA (Bourtoule et al., 2021) (an optimized exact unlearning algorithm for SGD-based learning).
Researcher Affiliation Academia 1Department of Computer Science, University of Waterloo, Waterloo, Canada 2Vector Institute, Toronto, Canada 3Department of Computer Science, University of Toronto, Toronto, Canada. Correspondence to: Andrei Muresanu <EMAIL>.
Pseudocode Yes Algorithm 1 In-context Learning with ERASE Require: A set of training examples D, the desired number of in-context examples k, and quantization parameter ϵ Ensure: Examples q(i) = [q(i) 1 , q(i) 2 , . . . q(i) k ] for in-context learning
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It does not contain an explicit statement about code release or a link to a repository.
Open Datasets Yes Task Selection The 15 tasks we evaluate on are from Big Bench (Srivastava et al., 2023) (released under the Apache 2.0 license).
Dataset Splits No The paper refers to using Big Bench tasks and evaluating on the 'entire test set' in Section 5.3, and discusses hyperparameter tuning using the 'intent recognition dataset' to choose the learning rate with the 'lowest test perplexity' in Section 5.1. However, it does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or clear references to predefined splits used for their experimental setup).
Hardware Specification Yes All experiments were run on a single node containing four A40 Nvidia GPUs.
Software Dependencies No The paper mentions using 'a pipeline based on Alpa (Zheng et al., 2022)' and the 'Flops Profiler package (Li, 2023)', but does not provide specific version numbers for these or other key software components like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup Yes We use a block size of 256 tokens and batch size of 8. We use the Adam optimizer (Kingma & Ba, 2017) with β1 = 0.9, β2 = 0.98, weight decay of 0.01, and learning rate of 1e-5. We also use 10 warm-up steps with a linear schedule. The full list of training parameters can be found in Table E.