Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models

Authors: Daiki Chijiwa, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito, Susumu Takeuchi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results, covering both vision and language models, demonstrate that the PRT-trained model can achieve comparable accuracy to the existing work of inference-time tuning, with less inference cost. In Section 4, the paper presents 'Experiments' with detailed 'Results' (Figures 2, 3, 4) and 'Memory and Speed Analysis' (Tables 1, 2, 3), indicating empirical evaluation.
Researcher Affiliation Industry 1NTT Computer and Data Science Laboratories, NTT Corporation 2NTT Human Informatics Laboratories, NTT Corporation. Correspondence to: Daiki Chijiwa <EMAIL>, Taku Hasegawa <EMAIL>. All authors are affiliated with NTT Corporation, an industry entity, and the email domains are @ntt.com.
Pseudocode Yes Algorithm 1 Pseudocode for Training of PRT. Algorithm 2 Pseudocode for Inference of PRT.
Open Source Code No The text does not contain an explicit statement from the authors about releasing their own code for the methodology described in this paper, nor does it provide a direct link to a code repository.
Open Datasets Yes We employed CLIP models (...) pretrained on various datasets including (...) LAION-400M (Schuhmann et al., 2021), LAION-2B (Schuhmann et al., 2022), and Data Comp-1B (Gadre et al., 2024). (...) For each fine-grained dataset, such as Cars (Krause et al., 2013) and CUB (Wah et al., 2011). (...) Aircraft (Maji et al., 2013), Caltech101 (Li et al., 2022), Cars (Krause et al., 2013), CIFAR-100 (Krizhevsky et al., 2009), Country211 (Radford et al., 2021), CUB (Wah et al., 2011), Flowers (Nilsback & Zisserman, 2008), RESISC45 (Cheng et al., 2017). (...) Tulu v2 dataset (Ivison et al., 2023). (...) GSM8K (Cobbe et al., 2021) and IFEval (Zhou et al., 2023).
Dataset Splits Yes For each fine-grained dataset, such as Cars (Krause et al., 2013) and CUB (Wah et al., 2011), we first constructed and fixed the classification layer of each pretrained model for zeroshot classification, and then fine-tuned (or reward-tuned) its feature extractor on the train set. (3) Evaluation: We evaluated models on the test set of each dataset where the training set was used for tuning the models.
Hardware Specification Yes All models are trained on a single A100 GPU. We conducted all training on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions optimizers (Adam) but does not provide specific software library names with version numbers (e.g., PyTorch, TensorFlow) or specific Python versions.
Experiment Setup Yes In training of either standard fine-tuning or PRT, we used the same hyperparameters following existing work (Ilharco et al., 2023) as follows: learning rate = 1e-5, batch size = 128, number of iterations = 2000, optimizer = Adam, cosine annealing with 500 warmup iterations. The training conditions are as follows: learning rate = 2e-5, batch size = 128, number of epochs = 2, optimizer = Adam, warmup ratio = 0.03, and learning rate scheduler = linear.