Composable Interventions for Language Models
Authors: Arinbjörn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories knowledge editing, model compression, and machine unlearning. Our results over 417 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. |
| Researcher Affiliation | Collaboration | Arinbjörn Kolbeinsson* University of Virginia & Askan EMAIL Kyle O Brien* Eleuther AI Tianjin Huang* University of Exeter Shanghua Gao Harvard Medical School Shiwei Liu University of Oxford Jonathan Richard Schwarz Thomson-Reuters Foundational Research Anurag Vaidya Harvard Medical School Mass General Brigham Faisal Mahmood Harvard Medical School Mass General Brigham Marinka Zitnik Harvard Medical School Tianlong Chen UNC Chapel Hill Tom Hartvigsen University of Virginia & Thomson-Reuters Foundational Research EMAIL |
| Pseudocode | No | The paper describes methods and metrics using textual descriptions and mathematical equations (e.g., Equation 1 and Equation 2 for composability metrics) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All of our code is available at: github.com/hartvigsen-group/composable-interventions |
| Open Datasets | Yes | We use the zs RE (Levy et al., 2017) dataset, which is a popular question-answering benchmark for knowledge editing. ... We evaluate unlearning with Weapons of Mass Description Proxy (WMDP) (Li et al., 2024a)... we make the standard choice to evaluate question answering accuracy on MMLU (Hendrycks et al., 2020) using the LM Eval Harness (Gao et al., 2023). |
| Dataset Splits | Yes | All results for knowledge editing methods are averaged over 10 batches of 50 randomly-selected edits from zs RE. ... We average the performance on WMDP s cyber and bio splits, totaling 3,260 questions. |
| Hardware Specification | No | We thank the University of Virginia Research Computing team for providing access to excellent high-performance computing resources. |
| Software Dependencies | No | The paper mentions various models and methods like 'Llama3-8B (AI@Meta, 2024)', 'MEMIT (Meng et al., 2023)', 'LoRA (Hu et al., 2021)', 'Sparse GPT (Frantar & Alistarh, 2023)', 'Wanda (Sun et al., 2023)', 'GPTQ (Frantar et al., 2023)', 'AWQ (Lin et al., 2023)', and refers to the 'RMU repo' but does not specify version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | All experiments in our main results (Section 4) are performed with Llama3-8B (AI@Meta, 2024)... We use the state-of-the-art MEMIT (Meng et al., 2023) model editor, which applies batches of edits simultaneously. The editing process was applied to layers 4 through 8 of the model, with a clamp normalization factor set at 4. The learning parameters adhered closely to the original implementation: v_num_grad_steps was set to 25, accompanied by a learning rate (lr) of 0.5, and using the last layer for loss calculation. Additionally, a weight decay (weight_decay) of 0.001 was employed. The KL divergence contribution to the overall loss was controlled by a KL_factor of 0.0625. Moreover, a second momentum adjustment was enabled, with an update weight of 15000, to fine-tune the optimization process. The model generated a maximum length of 40 tokens and a batch size of 50, matching the number of edits being made. 10 repeats were made for each edit and the results averaged. |