TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Authors: Felipe Pinto Coelho Nuti, Tim Franzmeyer, Joao F. Henriques

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that one can steer model behavior and performance by upor down-scaling the fine-tuning component during the forward pass. [...] We empirically validate that Tu Co is indeed much lower for pre-training-like inputs from the Open Web Text dataset [...] We then investigate how three prominent jailbreaking techniques affect the Tuning Contribution. [...] We compute the Tuning Contribution as described in Algorithm 1. We explain all experiments in more detail in the Appendix and make all code available publicly.
Researcher Affiliation Academia 1University of Oxford. Correspondence to: Felipe Nuti <EMAIL>.
Pseudocode Yes Algorithm 1 Computation of Tuning Contribution (Tu Co) Input: Pre-trained model T PT φ , Fine-Tuned model T FT Θ , prompt s x0 Embed(Tokenizer(s)) {Tokenize and embed prompt} IFTC, IPTC 0 {Initialize cumulative contributions} for l = 0 to L 1 do PTCl f PT φ (xl, l) {Compute PTC for layer l} FTCl f FT Θ (xl, l) PTCl {Compute FTC for layer l} xl+1 xl + PTCl + FTCl {Update x for next layer} IFTC IFTC + FTCl[ 1] {Accumulate last-token FTC} IPTC IPTC + PTCl[ 1] {Accumulate last-token PTC} end for Tu Co IFTC IPTC + IFTC {Compute Tu Co} Return: Tu Co
Open Source Code Yes 2Code is available at http://github.com/Felipe Nuti/tuning-contribution.
Open Datasets Yes Empirically, we also find that scaling the magnitude of the fine-tuning component controls model behaviors and capabilities. Specifically, tuning of the FTC results in as much as 5% test-set performance improvements for tasks of the MMLU benchmark (Hendrycks et al., 2020). [...] We empirically validate that Tu Co is indeed much lower for pre-training-like inputs from the Open Web Text dataset (Gokaslan & Cohen, 2019) than for chat-like inputs from a dataset designed for harmless and helpful model behavior (Bai et al., 2022a; Ganguli et al., 2022). [...] We construct a dataset consisting of the harmful instructions from the Adv Bench benchmark (Zou et al., 2023b) in English, Japanese, Hungarian, Swahili and Malayalam.
Dataset Splits Yes We use 5-fold cross-validation, and report the change in out-of-sample average accuracy CV(D), averaged across folds of a dataset D. [...] To evaluate how much we can increase model accuracy by choosing α appropriately, we first evenly divide D into K = 5 folds D1, , DK.
Hardware Specification No The paper mentions 'open-source models of up to 13B parameters' and 'GPU memory and running time constraints' in relation to MMLU tasks, but does not provide specific details on the GPU models, CPU models, or other hardware specifications used for their experiments.
Software Dependencies No The paper does not explicitly list software dependencies with specific version numbers.
Experiment Setup Yes We modulate the magnitude of the fine-tuning component FTC throughout the forward pass, and study to what extent model performance and behavior can be controlled via this modulation. [...] We evaluate the impact of scaling α between 0.75 and 1.25 on model outputs [...] We next optimize accuracy for each task and behavior using a grid search for α [0.75, 0.9, 0.95, 1.0, 1.05, 1.1, 1.25].