Fast Exact Unlearning for In-Context Learning Data for LLMs
Authors: Andrei Ioan Muresanu, Anvith Thudi, Michael R. Zhang, Nicolas Papernot
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluations explored how it compared to existing exact unlearning baselines. We conducted experiments across Big-Bench Instruction Induction (BBII) tasks, and compared performance of ERASE to variants of SISA (Bourtoule et al., 2021) (an optimized exact unlearning algorithm for SGD-based learning). |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Waterloo, Waterloo, Canada 2Vector Institute, Toronto, Canada 3Department of Computer Science, University of Toronto, Toronto, Canada. Correspondence to: Andrei Muresanu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 In-context Learning with ERASE Require: A set of training examples D, the desired number of in-context examples k, and quantization parameter ϵ Ensure: Examples q(i) = [q(i) 1 , q(i) 2 , . . . q(i) k ] for in-context learning |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It does not contain an explicit statement about code release or a link to a repository. |
| Open Datasets | Yes | Task Selection The 15 tasks we evaluate on are from Big Bench (Srivastava et al., 2023) (released under the Apache 2.0 license). |
| Dataset Splits | No | The paper refers to using Big Bench tasks and evaluating on the 'entire test set' in Section 5.3, and discusses hyperparameter tuning using the 'intent recognition dataset' to choose the learning rate with the 'lowest test perplexity' in Section 5.1. However, it does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or clear references to predefined splits used for their experimental setup). |
| Hardware Specification | Yes | All experiments were run on a single node containing four A40 Nvidia GPUs. |
| Software Dependencies | No | The paper mentions using 'a pipeline based on Alpa (Zheng et al., 2022)' and the 'Flops Profiler package (Li, 2023)', but does not provide specific version numbers for these or other key software components like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA. |
| Experiment Setup | Yes | We use a block size of 256 tokens and batch size of 8. We use the Adam optimizer (Kingma & Ba, 2017) with β1 = 0.9, β2 = 0.98, weight decay of 0.01, and learning rate of 1e-5. We also use 10 warm-up steps with a linear schedule. The full list of training parameters can be found in Table E. |