On Effects of Steering Latent Representation for Large Language Model Unlearning
Authors: Huu-Tien Dang, Tin Pham, Hoang Thanh-Tung, Naoya Inoue
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Adaptive RMU significantly improves the unlearning performance compared to prior art while incurring no additional computational cost. Experimental results show that Adaptive RMU achieves higher drop-in-accuracy for forget knowledge, maintaining high performance on general knowledge, and enables effective unlearning for most layers without incurring additional computational overhead. |
| Researcher Affiliation | Academia | Dang Huu-Tien1, Tin Pham1, Hoang Thanh-Tung2, and Naoya Inoue1,3 1Japan Advanced Institute of Science and Technology 2VNU University of Engineering and Technology, Vietnam 3RIKEN |
| Pseudocode | Yes | Algorithm 1: Adaptive RMU pseudocode |
| Open Source Code | Yes | Our code is available at https://github.com/RebelsNLU-jaist/llm-unlearning. |
| Open Datasets | Yes | We use WMDP-Biology and WMDP-Cyber forget datasets as Dforget and Wikitext (Merity et al. 2022) as Dretain for unlearning the LLM. Unlearned models are evaluated on WMDP Q&A datasets and MMLU (Hendrycks et al. 2021). |
| Dataset Splits | No | The paper mentions using 'WMDP-Biology and WMDP-Cyber forget datasets as Dforget and Wikitext (Merity et al. 2022) as Dretain' for unlearning and 'WMDP Q&A datasets and MMLU (Hendrycks et al. 2021)' for evaluation. However, it does not specify explicit percentages, counts, or a methodology for splitting these datasets into training, validation, or test sets within the scope of this research. |
| Hardware Specification | Yes | Two NVIDIA A40s with 90GB GPU were used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Adam W' as an optimizer but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version) that were used for implementation. |
| Experiment Setup | Yes | Models were fine-tuned using Adam W (Loshchilov and Hutter 2019) with learning rate η = 5e 5, batch-size of 4, max sequence len of 512 for WMDP-Biology and 768 for WMDP-Cyber, with T = 500 gradient update steps. The retain weight α = 1200. For the baseline RMU, we follow the previous work and let c = 6.5. We grid search for unlearn layer l from the third to the last layer. For the Adaptive RMU, we grid search for the scaling factor β {2, 3, 5, 10}. We report the performances of Adaptive RMU models with β = 5. We update three layers parameters {l, l 1, l 2} of the model. |