reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Authors: Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, Sai Qian Zhang

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate PEFT computation costs.
Researcher Affiliation	Academia	Zeyu Han EMAIL Northeastern University Chao Gao EMAIL University of California, Riverside Jinyang Liu EMAIL Northeastern University Jeff (Jun) Zhang EMAIL Arizona State University Sai Qian Zhang EMAIL New York University
Pseudocode	No	The paper describes various PEFT algorithms and their operations using mathematical equations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper refers to existing open-source libraries like 'Hugging Face s PEFT (Mangrulkar et al., 2022)' and 'Adapter Hub (Poth et al., 2023)' as part of its discussion, but it does not provide concrete access to source code for any methodology described or proposed by the authors of this survey paper.
Open Datasets	Yes	Two types of tasks have been widely used for LLM evaluation, the first type is the General Language Understanding Evaluation (GLUE) (Wang et al., 2018) benchmark... The other type of dataset that has been used in recent LLM papers is common sense reasoning which integrated into our study caters to a variety of research facets: (1) Open Book QA (Mihaylov et al., 2018)... (2) PIQA (Bisk et al., 2020)... (3) Social IQA (Sap et al., 2019)... (4) Hella Swag (Zellers et al., 2019)... (5) Bool Q (Clark, 2019)... (6) Wino Grande (Sakaguchi et al., 2021)... (7) ARC-easy (Clark et al., 2018)... (8) ARC-challenges (Clark et al., 2018)... Image recognition serves as a key benchmark and application for vision models... including datasets like Kinetics400 (Kay et al., 2017), SSv2 (Goyal et al., 2017), and HMDB51 (Kuehne et al., 2011). Additionally, PEFT has been utilized for dense prediction tasks, using datasets like MSCOCO (Lin et al., 2014), ADE20K (Zhou et al., 2017), and PASCAL VOC (Everingham et al., 2010).
Dataset Splits	No	The paper is a survey that discusses various datasets and benchmarks used in the field of PEFT. While it lists many datasets (e.g., GLUE benchmark, Open Book QA, PIQA, MSCOCO), it does not describe any specific training, test, or validation splits for these datasets within the context of experiments conducted by the authors.
Hardware Specification	No	The paper is a survey and does not report on original experiments conducted by its authors. While it references hardware specifications from cited works, such as 'a single 48GB GPU' for QLo RA and 'a single V100 with 32GB memory' for Long QLo RA, it does not provide any specific hardware details for experiments performed by the authors of this paper.
Software Dependencies	No	The paper is a comprehensive survey and does not describe any experimental setup or procedures performed by the authors. Therefore, it does not list specific software dependencies with version numbers for its own work.
Experiment Setup	No	The paper is a comprehensive survey and does not describe any experimental setup or procedures performed by the authors. Therefore, it does not provide specific hyperparameter values or system-level training settings.