On Effects of Steering Latent Representation for Large Language Model Unlearning

Authors: Huu-Tien Dang, Tin Pham, Hoang Thanh-Tung, Naoya Inoue

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Adaptive RMU significantly improves the unlearning performance compared to prior art while incurring no additional computational cost. Experimental results show that Adaptive RMU achieves higher drop-in-accuracy for forget knowledge, maintaining high performance on general knowledge, and enables effective unlearning for most layers without incurring additional computational overhead.
Researcher Affiliation Academia Dang Huu-Tien1, Tin Pham1, Hoang Thanh-Tung2, and Naoya Inoue1,3 1Japan Advanced Institute of Science and Technology 2VNU University of Engineering and Technology, Vietnam 3RIKEN
Pseudocode Yes Algorithm 1: Adaptive RMU pseudocode
Open Source Code Yes Our code is available at https://github.com/RebelsNLU-jaist/llm-unlearning.
Open Datasets Yes We use WMDP-Biology and WMDP-Cyber forget datasets as Dforget and Wikitext (Merity et al. 2022) as Dretain for unlearning the LLM. Unlearned models are evaluated on WMDP Q&A datasets and MMLU (Hendrycks et al. 2021).
Dataset Splits No The paper mentions using 'WMDP-Biology and WMDP-Cyber forget datasets as Dforget and Wikitext (Merity et al. 2022) as Dretain' for unlearning and 'WMDP Q&A datasets and MMLU (Hendrycks et al. 2021)' for evaluation. However, it does not specify explicit percentages, counts, or a methodology for splitting these datasets into training, validation, or test sets within the scope of this research.
Hardware Specification Yes Two NVIDIA A40s with 90GB GPU were used to run the experiments.
Software Dependencies No The paper mentions 'Adam W' as an optimizer but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version) that were used for implementation.
Experiment Setup Yes Models were fine-tuned using Adam W (Loshchilov and Hutter 2019) with learning rate η = 5e 5, batch-size of 4, max sequence len of 512 for WMDP-Biology and 768 for WMDP-Cyber, with T = 500 gradient update steps. The retain weight α = 1200. For the baseline RMU, we follow the previous work and let c = 6.5. We grid search for unlearn layer l from the third to the last layer. For the Adaptive RMU, we grid search for the scaling factor β {2, 3, 5, 10}. We report the performances of Adaptive RMU models with β = 5. We update three layers parameters {l, l 1, l 2} of the model.