reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Authors: Sungmin Cha, Sungjun Cho, Dasol Hwang, Moontae Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Training Data Extraction Challenge dataset using GPT-Neo models as well as on the TOFU benchmark with Phi-1.5B and Llama2-7B models demonstrate that our approach effectively removes sensitive information while maintaining reasoning and generative capabilities with minimal impact.
Researcher Affiliation	Collaboration	Sungmin Cha1 Sungjun Cho2 Dasol Hwang3 Moontae Lee3,4 1New York University 2University of Wisconsin-Madison 3LG AI Research 4University of Illinois Chicago EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and textual explanations, but no distinct pseudocode or algorithm blocks are present.
Open Source Code	Yes	Our implementation can be found in https://github.com/csm9493/efficient-llm-unlearning.
Open Datasets	Yes	Experiments on the Training Data Extraction Challenge dataset using GPT-Neo models as well as on the TOFU benchmark with Phi-1.5B and Llama2-7B models demonstrate that our approach effectively removes sensitive information while maintaining reasoning and generative capabilities with minimal impact. Training Data Extraction Challenge (TDEC) dataset (Carlini et al., 2021) consists of 20k examples from the Pile dataset (Gao et al., 2020) found to be easily extractable from a pretrained LLM. For the retain set Dr, we use the subset of Wiki Text (Merity et al., 2017). The Task of Fictitious Unlearning (TOFU) benchmark (Maini et al., 2024).
Dataset Splits	Yes	For each experiment, we randomly sample 32 sequences with 200 tokens to consist the forget set Df. For the retain set Dr, we use the subset of Wiki Text (Merity et al., 2017). ...our task is to unlearn all information regarding 1%, 5%, or 10% of the authors from the model. Note that we can obtain reference models finetuned only on the retain set (QA-pairs on 99%, 95%, or 90% of authors).
Hardware Specification	Yes	All experiments were conducted on a remote server equipped with NVIDIA A100 40GB Tensor Core GPUs.
Software Dependencies	No	The paper mentions using 'Adam W optimizer (Loshchilov & Hutter, 2019)' but does not provide specific version numbers for other key software components or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For this experiment, we use a fixed learning rate of 2e-4 and use Lo RA adapters with rank r = {4, 8, 16, 32}. For unlearning, we use a learning rate of 2e-4 if our base model is from Phi-1.5B and 1e-4 for Llama2-7B. All training procedures run 5 epochs with an effective batch size of 32 using the Adam W optimizer (Loshchilov & Hutter, 2019).