reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decoupling Angles and Strength in Low-rank Adaptation

Authors: Massimo Bini, Leander Girrbach, Zeynep Akata

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our proposed De Lo RA method for image generation, natural language understanding, and instruction tuning tasks. We begin by providing a detailed description of these tasks and their relevance. To justify our design choices, we present a comprehensive ablation study that highlights the key innovations of De Lo RA. Finally, we demonstrate that De Lo RA not only matches or exceeds the performance of Lo RA and other state-of-the-art methods but also exhibits superior robustness.
Researcher Affiliation	Academia	1University of T ubingen, T ubingen AI Center, 2Helmholtz Munich, 3Technical University of Munich, Munich Center for Machine Learning, MDSI
Pseudocode	No	The paper describes methods using mathematical formulations and textual explanations but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is available at https://github.com/Explainable ML/De Lo RA.
Open Datasets	Yes	The dataset, sourced from (Ruiz et al., 2023), comprises 30 subjects... For training and evaluation, we utilize semantic maps and images from the ADE20K dataset (Zhou et al., 2019)... on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018)... finetuning LLa MA-2-7B (Touvron et al., 2023b) on the Alpaca dataset (Taori et al., 2023).
Dataset Splits	Yes	Following Wu et al. (2024c), for each benchmark task, we split the publicly available validation set in two subsets as reported in Table 7. When validation sets are larger than 2K, a 1K subset is used as new validation set, and the remaining as test set, otherwise the validation is split in two equally sized subsets. We use the new validation set to tune the hyperparameters on seed 42. Then, best hyperparameters are used to evaluate test performance for seeds 42, 43, 44, 45, 46.
Hardware Specification	Yes	The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer JUWELS (Alvarez, 2021) at J ulich Supercomputing Centre (JSC).
Software Dependencies	No	The paper mentions various models and tasks like Stable Diffusion, RoBERTa-base, and LLaMA-2, and uses 'bfloat16 precision', but does not provide specific version numbers for any software libraries, frameworks, or dependencies used for implementation or experimentation.
Experiment Setup	Yes	For Lo RA and Do RA we followed best practices and fixed lambda to twice the rank during hyperparemeter search. Optimal learning rate for both methods is 6e-4. For De Lo RA we fixed the λ scaling parameter to 1e-3, and found an optimal learning rate of 2e-2 for the BA matrices. ... For larger datasets (MNLI, SST-2, QNLI, QQP) we fix the λ scaling learning rate to 3e-3, while for smaller datasets we fix it to 1e-2. For other hyperparameters we run a small grid search. Best values are reported in Table 9.