reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning

Authors: Chongjie Si, Zhiyi Shi, Shifan Zhang, Xiaokang Yang, Hanspeter Pfister, Wei Shen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have conclusively demonstrated the effectiveness of Lo RA-Dash, and in-depth analyses further reveal the underlying mechanisms of Lo RA-Dash. To further explore the properties of TSDs, we fully fine-tune LLa MA-7B Touvron et al. (2023a) on commonsense reasoning tasks
Researcher Affiliation	Academia	1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2Harvard University
Pseudocode	Yes	The algorithm of Lo RA-Dash is shown in Fig. 6. Figure 6: Pseudo codes of Lo RA-Dash.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the authors have released their source code for the methodology described in this paper.
Open Datasets	Yes	The commonsense reasoning benchmarks consist of 8 distinct sub-tasks, each with its designated dataset, i.e., Bool Q Clark et al. (2019), PIQA Bisk et al. (2020), SIQA Sap et al. (2019), Hella S. Zellers et al. (2019), Wino G. Sakaguchi et al. (2021), ARC-e/ARC-c Clark et al. (2018), OBQA Mihaylov et al. (2018). For natural language understanding (NLU) task, we adopt the General Language Understanding Evaluation (GLUE) Wang et al. (2018) benchmark... We use the SDXL model Podell et al. (2023), applying both Lo RA and Lo RA-Dash methods for fine-tuning. We mainly adopt the official data of Dream Booth Ruiz et al. (2023) for diffusion.
Dataset Splits	Yes	Adhering to the protocol outlined by Hu et al. (2023), we merge the training datasets from all tasks to form a comprehensive training dataset (Commonsense170K dataset), subsequently performing evaluations against each individual task’s testing set. The details of these datasets are shown in Table. 16. Table 16: Details of GLUE dataset. Dataset Task # Train # Dev # Test # Label Metrics.
Hardware Specification	Yes	All methods are implemented using the publicly available Py Torch Paszke et al. (2019) framework, and all experiments are conducted on NVIDIA A100 GPUs. We have also test that most of the experiments can also be conducted on one consumer GPU resource such as NVIDIA RTX3090. The fine-tuning process is conducted with a learning rate of 1e-4 and a batch size of 4. We train the model over 500 steps on a single 80GB A100 GPU, taking approximately 23 minutes to complete.
Software Dependencies	No	All methods are implemented using the publicly available Py Torch Paszke et al. (2019) framework. The paper mentions PyTorch but does not specify a version number.
Experiment Setup	Yes	The hyper-parameter t of Lo RA-Dash is set to 100, and s = 8. The hyper-parameter settings of Lo RA-Dash are shown in Table. 7. Table 7: Hyper-parameter settings of Lo RA-Dash on commonsense reasoning task. Hyper-parameters LLa MA-7B LLa MA2-7B LLa MA3-8B Rank r 4 8 16 32 64 16 32 16 32 α 8 16 32 64 128 32 64 32 64 LR 5e-4 4e-4 5e-4 1e-4 0.9e-4 2e-4 1e-4 2e-4 0.8e-4 LR Scheduler Linear Dropout 0.05 Optimizer Adam W Batch size 16 Warmup Steps 100 Where Q, K, V, Up, Down. The hyper-parameter settings for this task is shown in Table. 17. Table 17: Hyper-parameter settings of Lo RA-Dash on NLU task. Hyper-parameter MNLI SST-2 Co LA QQP QNLI RTE MRPC STS-B Optimizer Adam W Warmup Ratio 0.1 LR schedule Linear Rank r 2 & 8 Lo RA alpha 4 & 16 Max Seq. Len. 256 128 64 320 512 320 320 128 Batch Size 32 32 32 32 32 32 32 32 Learning Rate 5e-4 8e-4 8e-4 1e-3 5e-4 1.2e-3 1e-3 5e-4 Epochs 12 24 25 5 5 50 30 25. The fine-tuning process is conducted with a learning rate of 1e-4 and a batch size of 4. We train the model over 500 steps.