reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Authors: Pengwei Tang, Xiaolin Hu, Yong Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In comprehensive experiments across 23 natural language processing tasks and 4 typical PLMs of different scales, ADe PT consistently surpasses the other leading parameter-efficient fine-tuning methods, and even outperforms the full fine-tuning in certain scenarios. We also provide a theoretical analysis towards ADe PT. Code is available at https://github.com/Hunger PWAY/ADe PT.
Researcher Affiliation	Academia	Pengwei Tang, Xiaolin Hu, Yong Liu Renmin University of China, Beijing, China EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and textual explanations but does not include any explicit 'Pseudocode' or 'Algorithm' labeled blocks or figures.
Open Source Code	Yes	Code is available at https://github.com/Hunger PWAY/ADe PT.
Open Datasets	Yes	We consider four benchmarks and 4 other datasets: (1) GLUE (Wang et al., 2018) benchmark, which includes MNLI (Williams et al., 2018), QQP1, QNLI (Rajpurkar et al., 2016), SST-2 (Socher et al., 2013), STS-B (Cer et al., 2017), MRPC (Dolan & Brockett, 2005), RTE (Giampiccolo et al., 2007) and Co LA (Warstadt et al., 2019); (2) Super GLUE benchmark (Wang et al., 2019), which includes Multi RC (Khashabi et al., 2018), Bool Q (Clark et al., 2019), Wi C (Pilehvar & Camacho Collados, 2019), WSC (Levesque et al., 2012), CB (De Marneffe et al., 2019) and Re Co RD (Zhang et al., 2018); (3) MRQA 2019 Shared Task (Fisch et al., 2019), which includes Natural Questions (Kwiatkowski et al., 2019), Hotpot QA (Yang et al., 2018), Search QA (Dunn et al., 2017) and News QA (Trischler et al., 2017); (4) MBPP benchmark (Austin et al., 2021), which is a code generation task; (5) other datasets, which includes Wino Grande (Sakaguchi et al., 2021), Yelp-2 (Zhang et al., 2015), Sci Tail (Khot et al., 2018) and PAWS-Wiki (Zhang et al., 2019).
Dataset Splits	Yes	For the MBPP benchmark, following Jain et al. (2024), we use a 50-50 split for training and test. Table 16: The datasets assessed in this study are described as follows. The term Train refers to the number of samples in the training set, whereas Valid and Test indicate the number of samples in the validation set and test set, respectively. (Table 16 then lists specific Train, Valid, Test counts for all datasets.)
Hardware Specification	No	The paper mentions 'GPU resources' and 'computational resources' in general terms, but does not provide specific details on the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	We implement our experiments by using Pytorch2, Huggingface Transformers3, and Huggingface PEFT 4. (The numbers 2, 3, 4 refer to footnotes for URLs, not specific version numbers of the software.)
Experiment Setup	Yes	Following Shi & Lipani (2024), we use 100 learnable virtual tokens as the soft prompt of PT. For our proposed ADe PT, we adjust the hyperparameters to maintain an equivalent number of trainable parameters as PT... For the T5-base model, the token embedding dimension d is 768... we search the length of soft prompt from 20, 40, 60, and 80... For the T5-base model, we directly quote performance metrics from published papers... For T5-3B model, we consistently use 60 virtual tokens and bottleneck size r = 19... For small datasets (< 70,000 training samples) based on T5 model, we follow the learning strategy of Shi & Lipani (2024): we search the learning rate for the soft prompt from 3e-1, 4e-1, 5e-1, and for the feed-forward neural network from 1e-4, 1e-5. For large datasets (> 70,000 training samples) based on T5 model, we use learning rate 3e-1 for the soft prompt and 1e-4 for the feed-forward neural networks. For the MBPP benchmark, following Jain et al. (2024), we use learning rates of 1e-3 for the prompting-style tuning method, 1e-4 for Lo RA. Appendix E provides detailed hyperparameters in Table 17, Table 18, and Table 19, including 'number of steps', 'batch size', 'maximum learning rate', 'length of the soft prompt', 'maximum sequence length', 'learning rate optimizer Adam W', 'Adam epsilon', 'Adam beta weights', 'learning rate scheduler Warmup linear', 'Weight decay', and 'Warmup steps'.