Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Authors: Daiki Chijiwa, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito, Susumu Takeuchi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results, covering both vision and language models, demonstrate that the PRT-trained model can achieve comparable accuracy to the existing work of inference-time tuning, with less inference cost. In Section 4, the paper presents 'Experiments' with detailed 'Results' (Figures 2, 3, 4) and 'Memory and Speed Analysis' (Tables 1, 2, 3), indicating empirical evaluation. |
| Researcher Affiliation | Industry | 1NTT Computer and Data Science Laboratories, NTT Corporation 2NTT Human Informatics Laboratories, NTT Corporation. Correspondence to: Daiki Chijiwa <EMAIL>, Taku Hasegawa <EMAIL>. All authors are affiliated with NTT Corporation, an industry entity, and the email domains are @ntt.com. |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Training of PRT. Algorithm 2 Pseudocode for Inference of PRT. |
| Open Source Code | No | The text does not contain an explicit statement from the authors about releasing their own code for the methodology described in this paper, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We employed CLIP models (...) pretrained on various datasets including (...) LAION-400M (Schuhmann et al., 2021), LAION-2B (Schuhmann et al., 2022), and Data Comp-1B (Gadre et al., 2024). (...) For each fine-grained dataset, such as Cars (Krause et al., 2013) and CUB (Wah et al., 2011). (...) Aircraft (Maji et al., 2013), Caltech101 (Li et al., 2022), Cars (Krause et al., 2013), CIFAR-100 (Krizhevsky et al., 2009), Country211 (Radford et al., 2021), CUB (Wah et al., 2011), Flowers (Nilsback & Zisserman, 2008), RESISC45 (Cheng et al., 2017). (...) Tulu v2 dataset (Ivison et al., 2023). (...) GSM8K (Cobbe et al., 2021) and IFEval (Zhou et al., 2023). |
| Dataset Splits | Yes | For each fine-grained dataset, such as Cars (Krause et al., 2013) and CUB (Wah et al., 2011), we first constructed and fixed the classification layer of each pretrained model for zeroshot classification, and then fine-tuned (or reward-tuned) its feature extractor on the train set. (3) Evaluation: We evaluated models on the test set of each dataset where the training set was used for tuning the models. |
| Hardware Specification | Yes | All models are trained on a single A100 GPU. We conducted all training on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions optimizers (Adam) but does not provide specific software library names with version numbers (e.g., PyTorch, TensorFlow) or specific Python versions. |
| Experiment Setup | Yes | In training of either standard fine-tuning or PRT, we used the same hyperparameters following existing work (Ilharco et al., 2023) as follows: learning rate = 1e-5, batch size = 128, number of iterations = 2000, optimizer = Adam, cosine annealing with 500 warmup iterations. The training conditions are as follows: learning rate = 2e-5, batch size = 128, number of epochs = 2, optimizer = Adam, warmup ratio = 0.03, and learning rate scheduler = linear. |