Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Authors: Sungmin Cha, Sungjun Cho, Dasol Hwang, Moontae Lee
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Training Data Extraction Challenge dataset using GPT-Neo models as well as on the TOFU benchmark with Phi-1.5B and Llama2-7B models demonstrate that our approach effectively removes sensitive information while maintaining reasoning and generative capabilities with minimal impact. |
| Researcher Affiliation | Collaboration | Sungmin Cha1 Sungjun Cho2 Dasol Hwang3 Moontae Lee3,4 1New York University 2University of Wisconsin-Madison 3LG AI Research 4University of Illinois Chicago EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual explanations, but no distinct pseudocode or algorithm blocks are present. |
| Open Source Code | Yes | Our implementation can be found in https://github.com/csm9493/efficient-llm-unlearning. |
| Open Datasets | Yes | Experiments on the Training Data Extraction Challenge dataset using GPT-Neo models as well as on the TOFU benchmark with Phi-1.5B and Llama2-7B models demonstrate that our approach effectively removes sensitive information while maintaining reasoning and generative capabilities with minimal impact. Training Data Extraction Challenge (TDEC) dataset (Carlini et al., 2021) consists of 20k examples from the Pile dataset (Gao et al., 2020) found to be easily extractable from a pretrained LLM. For the retain set Dr, we use the subset of Wiki Text (Merity et al., 2017). The Task of Fictitious Unlearning (TOFU) benchmark (Maini et al., 2024). |
| Dataset Splits | Yes | For each experiment, we randomly sample 32 sequences with 200 tokens to consist the forget set Df. For the retain set Dr, we use the subset of Wiki Text (Merity et al., 2017). ...our task is to unlearn all information regarding 1%, 5%, or 10% of the authors from the model. Note that we can obtain reference models finetuned only on the retain set (QA-pairs on 99%, 95%, or 90% of authors). |
| Hardware Specification | Yes | All experiments were conducted on a remote server equipped with NVIDIA A100 40GB Tensor Core GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer (Loshchilov & Hutter, 2019)' but does not provide specific version numbers for other key software components or libraries (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For this experiment, we use a fixed learning rate of 2e-4 and use Lo RA adapters with rank r = {4, 8, 16, 32}. For unlearning, we use a learning rate of 2e-4 if our base model is from Phi-1.5B and 1e-4 for Llama2-7B. All training procedures run 5 epochs with an effective batch size of 32 using the Adam W optimizer (Loshchilov & Hutter, 2019). |