reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RandLoRA: Full rank parameter-efficient fine-tuning of large models

Authors: Paul Albert, Frederic Zhang, Hemanth Saratchandran, Cristian Rodriguez-Opazo, Anton Hengel, Ehsan Abbasnejad

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of Lo RA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where Rand Lo RA significantly reduces and sometimes eliminates the performance gap between standard fine-tuning and Lo RA, demonstrating its efficacy.
Researcher Affiliation	Academia	Australian Institute for Machine Learning The University of Adelaide {firstname.lastname}@adelaide.edu.au
Pseudocode	No	The paper includes mathematical equations and theoretical analysis but does not present any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	https://github.com/Paul Albert31/Rand Lo RA
Open Datasets	Yes	Through extensive experimentation across vision, language, and vision-language benchmarks... We fine-tune on 21 datasets (Appendix C.1, Table 7) and evaluate {1, 2, 4, 16}-shot learning and performance with 50% and 100% training data. ... We add Image Net (Krizhevsky et al., 2012) to the dataset pool to scale up to 22 classification datasets. ... We evaluate Rand Lo RA for fine-tuning LLMs on eight commonsense reasoning tasks (see Appendix C.4). ... General Language Understanding Evaluation (GLUE) (Wang et al., 2019) and End-to-end (E2E) Novikova et al. (2017) natural language generation benchmarks
Dataset Splits	Yes	We fine-tune on 21 datasets (Appendix C.1, Table 7) and evaluate {1, 2, 4, 16}-shot learning and performance with 50% and 100% training data. ... We fine-tune Qwen2 (0.5B), Phi3 (3B), and Llama3 (8B) models and assess data efficiency by training on both a 170,000-sample full dataset and a 15,000-sample subset, following Hu et al. (2023).
Hardware Specification	No	Figure 2 and Figure 3 report max GPU VRAM usage during training, but specific GPU or CPU models are not detailed. The acknowledgments mention 'supercomputing resources provided by the Phoenix HPC service at the University of Adelaide', but without specific hardware configurations.
Software Dependencies	No	The paper does not explicitly list any software dependencies with specific version numbers.
Experiment Setup	No	We perform a hyperparameter search to identify optimal settings for Lo RA, No LA, Ve RA, and Rand Lo RA to ensure a fair comparison. More details about the experimental settings can be found in appendix C. While the paper states that details are in an appendix, the specific hyperparameters or system-level training settings are not present in the main text provided.