reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Authors: Jeonghoon Kim, Jung Hyun Lee, Sungdong Kim, Joonsuk Park, Kang Min Yoo, Se Jung Kwon, Dongsoo Lee

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments In this section, we empirically validate the effectiveness of our proposed PEQA method by examining its performance in both parameter-efficient fine-tuning (PEFT) and as a quantization method. We achieve this goal by using a series of benchmarks [52 57], datasets [51, 58, 59], and LLMs [4, 6, 60, 61] that have been publicly introduced.
Researcher Affiliation	Collaboration	Jeonghoon Kim NAVER Cloud EMAIL Jung Hyun Lee NAVER Cloud EMAIL Sungdong Kim NAVER Cloud, KAIST AI EMAIL Joonsuk Park NAVER Cloud, NAVER AI Lab, University of Richmond EMAIL Kang Min Yoo NAVER Cloud, SNU AI Center EMAIL Se Jung Kwon NAVER Cloud EMAIL Dongsoo Lee NAVER Cloud EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	We utilize Huggingface repository[66]3 for training, evaluation code and dataset.
Open Datasets	Yes	We fine-tune and assess LLMs on the Wikitext2 [51] and Penn Tree Bank (PTB) [58] datasets using PEQA and Lo RA [21].
Dataset Splits	No	The paper does not provide specific data split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	Yes	To provide a clear understanding of these benefits, we conducted tests using a single NVIDIA A100-80GB GPU and the causal language modeling code from the Hugging Face repository7.
Software Dependencies	No	For the common experimental settings, Adam W [64] optimizer and linear-decaying learning rate scheduler were used. We use Deepspeed repository [65] 2 for FP16 and BF16 training. Additionally, we utilize Huggingface repository[66]3 for training, evaluation code and dataset.
Experiment Setup	Yes	Batch size and epoch for all experiments are set to 128 and 15 respectively. The learning rates for the experiments of Table 2 are displayed in Table 8.