LoFiT: Localized Fine-tuning on LLM Representations

Authors: Fangcong Yin, Xi Ye, Greg Durrett

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LOFIT on question answering (QA), multi-hop reasoning, and counterfactual reasoning tasks, which are common settings for evaluating interpretability-motivated methods [21, 60]. We focus on a relatively low data condition: for each dataset, we sample 500 training points or fewer, to be consistent with the common low-data setup of representation intervention methods.
Researcher Affiliation Academia Fangcong Yin The University of Texas at Austin EMAIL Xi Ye Princeton University EMAIL Greg Durrett The University of Texas at Austin EMAIL
Pseudocode No The paper describes the LOFIT methodology in text and Figure 1, but does not include a formally labeled “Pseudocode” or “Algorithm” block.
Open Source Code Yes 1Our code is available at https://github.com/fc2869/lo-fit.
Open Datasets Yes Truthful QA [24] is a QA dataset with questions where humans are likely to give false answers because of common misconceptions. (Section 4) Truthful QA [24] uses the Apache-2.0 license and data is available at: https://github.com/ sylinrl/Truthful QA. (Appendix I)
Dataset Splits Yes Truthful QA [24]... We follow the setup in [21] to split Truthful QA into train/dev/test sets into 326/82/407 questions... CLUTRR [41]... We use the subset of 2-hop questions and randomly split the data into train/dev/test sets of 300/450/450 QA pairs. MQu AKE [58]... Data is randomly split into train/dev/test sets of 134/95/864 QA pairs.
Hardware Specification Yes We fine-tune LOFIT and baselines using a single NVIDIA-RTX A6000 GPU with 48G memory.
Software Dependencies No We use the huggingface implementation of Transformers [51] in Py Torch for all fine-tuning, and the TRL [50] implementation of direct preference optimization [37] for fine-tuning on Truthful QA.
Experiment Setup Yes We fine-tune LOFIT and baselines using a single NVIDIA-RTX A6000 GPU with 48G memory. We use the huggingface implementation of Transformers [51] in Py Torch for all fine-tuning, and the TRL [50] implementation of direct preference optimization [37] for fine-tuning on Truthful QA. We use Adam W optimizer for fine-tuning [26] with ϵ = 1e 8 an a weight decay of factor 0.01. (Appendix C.1) For all experiments... we fine-tuned for 5 epochs with a batch size of 8... Method-specific hyperparameters can be found in the following subsections. Hyperparameters of LOFIT used in each experiment are summarized in Table 6. (Appendix D)