reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LoFiT: Localized Fine-tuning on LLM Representations

Authors: Fangcong Yin, Xi Ye, Greg Durrett

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate LOFIT on question answering (QA), multi-hop reasoning, and counterfactual reasoning tasks, which are common settings for evaluating interpretability-motivated methods [21, 60]. We focus on a relatively low data condition: for each dataset, we sample 500 training points or fewer, to be consistent with the common low-data setup of representation intervention methods.
Researcher Affiliation	Academia	Fangcong Yin The University of Texas at Austin EMAIL Xi Ye Princeton University EMAIL Greg Durrett The University of Texas at Austin EMAIL
Pseudocode	No	The paper describes the LOFIT methodology in text and Figure 1, but does not include a formally labeled “Pseudocode” or “Algorithm” block.
Open Source Code	Yes	1Our code is available at https://github.com/fc2869/lo-fit.
Open Datasets	Yes	Truthful QA [24] is a QA dataset with questions where humans are likely to give false answers because of common misconceptions. (Section 4) Truthful QA [24] uses the Apache-2.0 license and data is available at: https://github.com/ sylinrl/Truthful QA. (Appendix I)
Dataset Splits	Yes	Truthful QA [24]... We follow the setup in [21] to split Truthful QA into train/dev/test sets into 326/82/407 questions... CLUTRR [41]... We use the subset of 2-hop questions and randomly split the data into train/dev/test sets of 300/450/450 QA pairs. MQu AKE [58]... Data is randomly split into train/dev/test sets of 134/95/864 QA pairs.
Hardware Specification	Yes	We fine-tune LOFIT and baselines using a single NVIDIA-RTX A6000 GPU with 48G memory.
Software Dependencies	No	We use the huggingface implementation of Transformers [51] in Py Torch for all fine-tuning, and the TRL [50] implementation of direct preference optimization [37] for fine-tuning on Truthful QA.
Experiment Setup	Yes	We fine-tune LOFIT and baselines using a single NVIDIA-RTX A6000 GPU with 48G memory. We use the huggingface implementation of Transformers [51] in Py Torch for all fine-tuning, and the TRL [50] implementation of direct preference optimization [37] for fine-tuning on Truthful QA. We use Adam W optimizer for fine-tuning [26] with ϵ = 1e 8 an a weight decay of factor 0.01. (Appendix C.1) For all experiments... we fine-tuned for 5 epochs with a batch size of 8... Method-specific hyperparameters can be found in the following subsections. Hyperparameters of LOFIT used in each experiment are summarized in Table 6. (Appendix D)