Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Authors: Minh Le, Chau Nguyen, Huy Nguyen, Quyen Tran, Trung Le, Nhat Ho
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our study demonstrates that reparameterization is not merely an engineering trick but is grounded in deep theoretical foundations. Specifically, we show that the reparameterization strategy implicitly encodes a shared structure between prefix key and value vectors. Building on recent insights into the connection between prefix-tuning and mixture of experts models, we further illustrate that this shared structure significantly improves sample efficiency in parameter estimation compared to non-shared alternatives. The effectiveness of prefix-tuning across diverse tasks is empirically confirmed to be enhanced by the shared structure, through extensive experiments in both visual and language domains. |
| Researcher Affiliation | Collaboration | Minh Le 1, Chau Nguyen 1, Huy Nguyen 2, Quyen Tran1, Trung Le3, Nhat Ho2 1 Qualcomm AI Research 2 The University of Texas at Austin 3 Monash University |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The theoretical sections describe mathematical formulations, but these are not structured as pseudocode. |
| Open Source Code | No | The paper states: 'In order to facilitate the reproduction of our empirical results, we provide detailed descriptions of the experimental setup in Section 5.1 and Appendix E. All datasets used in this study are publicly available, enabling full replication of our experiments.' This statement refers to experimental setup and datasets but does not explicitly mention releasing source code for the methodology described in this paper. |
| Open Datasets | Yes | For visual tasks, we use the FGVC and VTAB-1K (Zhai et al., 2019) benchmarks. FGVC includes five Fine-Grained Visual Classification datasets: CUB-200-2011 (Wah et al., 2011), NABirds (Van Horn et al., 2015), Oxford Flowers (Nilsback & Zisserman, 2008), Stanford Dogs (Khosla et al., 2011), and Stanford Cars (Gebru et al., 2017). |
| Dataset Splits | Yes | Each VTAB-1K task contains 1,000 training examples. We follow the protocol from VPT (Jia et al., 2022) to perform the split of the train, validation, and test sets. Table 4 indicates specific splits such as: 'CUB-200-2011 (Wah et al., 2011) ... Train 5,394 Val 600 Test 5,794' and 'CIFAR-100 (Krizhevsky et al., 2009) ... Train 800/1000 Val 200'. |
| Hardware Specification | Yes | All experiments were implemented in Py Torch (Paszke et al., 2017) and executed on NVIDIA A100-40GB GPUs. |
| Software Dependencies | No | All experiments were implemented in Py Torch (Paszke et al., 2017) and executed on NVIDIA A100-40GB GPUs. This mentions PyTorch but does not specify its version number or versions for any other key software dependencies. |
| Experiment Setup | Yes | Following Jia et al. (2022), we perform a grid search to determine optimal hyperparameters, specifically learning rates from the set [50, 25, 10, 5, 2.5, 1, 0.5, 0.25, 0.1, 0.05] and weight decay values from [0.01, 0.001, 0.0001, 0.0]... The SGD optimizer is utilized for 100 epochs, incorporating a linear warm-up during the initial 10 epochs, followed by a cosine learning rate schedule. We report the average test set accuracy across five independent runs, maintaining consistent batch size settings of 64 and 128. |