reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts

Authors: Minh Le, Chau Nguyen, Huy Nguyen, Quyen Tran, Trung Le, Nhat Ho

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study demonstrates that reparameterization is not merely an engineering trick but is grounded in deep theoretical foundations. Specifically, we show that the reparameterization strategy implicitly encodes a shared structure between prefix key and value vectors. Building on recent insights into the connection between prefix-tuning and mixture of experts models, we further illustrate that this shared structure significantly improves sample efficiency in parameter estimation compared to non-shared alternatives. The effectiveness of prefix-tuning across diverse tasks is empirically confirmed to be enhanced by the shared structure, through extensive experiments in both visual and language domains.
Researcher Affiliation	Collaboration	Minh Le 1, Chau Nguyen 1, Huy Nguyen 2, Quyen Tran1, Trung Le3, Nhat Ho2 1 Qualcomm AI Research 2 The University of Texas at Austin 3 Monash University
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The theoretical sections describe mathematical formulations, but these are not structured as pseudocode.
Open Source Code	No	The paper states: 'In order to facilitate the reproduction of our empirical results, we provide detailed descriptions of the experimental setup in Section 5.1 and Appendix E. All datasets used in this study are publicly available, enabling full replication of our experiments.' This statement refers to experimental setup and datasets but does not explicitly mention releasing source code for the methodology described in this paper.
Open Datasets	Yes	For visual tasks, we use the FGVC and VTAB-1K (Zhai et al., 2019) benchmarks. FGVC includes five Fine-Grained Visual Classification datasets: CUB-200-2011 (Wah et al., 2011), NABirds (Van Horn et al., 2015), Oxford Flowers (Nilsback & Zisserman, 2008), Stanford Dogs (Khosla et al., 2011), and Stanford Cars (Gebru et al., 2017).
Dataset Splits	Yes	Each VTAB-1K task contains 1,000 training examples. We follow the protocol from VPT (Jia et al., 2022) to perform the split of the train, validation, and test sets. Table 4 indicates specific splits such as: 'CUB-200-2011 (Wah et al., 2011) ... Train 5,394 Val 600 Test 5,794' and 'CIFAR-100 (Krizhevsky et al., 2009) ... Train 800/1000 Val 200'.
Hardware Specification	Yes	All experiments were implemented in Py Torch (Paszke et al., 2017) and executed on NVIDIA A100-40GB GPUs.
Software Dependencies	No	All experiments were implemented in Py Torch (Paszke et al., 2017) and executed on NVIDIA A100-40GB GPUs. This mentions PyTorch but does not specify its version number or versions for any other key software dependencies.
Experiment Setup	Yes	Following Jia et al. (2022), we perform a grid search to determine optimal hyperparameters, specifically learning rates from the set [50, 25, 10, 5, 2.5, 1, 0.5, 0.25, 0.1, 0.05] and weight decay values from [0.01, 0.001, 0.0001, 0.0]... The SGD optimizer is utilized for 100 epochs, incorporating a linear warm-up during the initial 10 epochs, followed by a cosine learning rate schedule. We report the average test set accuracy across five independent runs, maintaining consistent batch size settings of 64 and 128.