reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RaSA: Rank-Sharing Low-Rank Adaptation

Authors: Zhiwei He, Zhaopeng Tu, Xing Wang, Xingyu Chen, Zhijie Wang, Jiahao Xu, Tian Liang, Wenxiang Jiao, Zhuosheng Zhang, Rui Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretically grounded and empirically validated approach demonstrates that Ra SA not only maintains the core advantages of Lo RA but also significantly boosts performance in challenging code and math tasks. Finally, we conducted experiments on mathematical reasoning and code generation, demonstrating that the lower reconstruction error translates to improved downstream task performance.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University 2Tencent AI Lab
Pseudocode	No	The paper describes update rules and derivations using mathematical equations (Equations 19, 24-26) in Appendix A, but does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code, data and scripts are available at: https://github.com/zwhe99/Ra SA.
Open Datasets	Yes	Code Generation: We used Magicoder-Evol-Instruct-110k (Wei et al., 2024) as the training data, ... We used Humaneval+ (Liu et al., 2023) as the test set, an extension of the Humaneval benchmark... Mathematical Reasoning: We used Meta Math QA (Yu et al., 2024) as the training data, ... We used MATH (Hendrycks et al., 2021) as the test set... Specifically, we calculate prediction accuracies on the following three benchmarks: (1) Hella Swag (Zellers et al., 2019): ...; (2) Wino Grande (Sakaguchi et al., 2019): ...; (3) ARC-Challenge (Clark et al., 2018): ...
Dataset Splits	No	The paper designates specific datasets for training (e.g., Magicoder-Evol-Instruct-110k, Meta Math QA) and separate datasets for testing (e.g., Humaneval+, MATH), but does not specify how these datasets were split into training, validation, or test sets by the authors within their own experimental setup. For scaling analysis, it mentions 'random sampling of 25% and 50% instances from the SFT data for the mathematics reasoning task', but this is not a general dataset split for all experiments.
Hardware Specification	Yes	All experiments for the 7-8B models were conducted on 1 node 8 A100-40G GPUs. For the 70B and Mo E models, we used 8 nodes.
Software Dependencies	No	The paper mentions software tools like 'Bigcode Evaluation Harness' and 'LLMs Evaluation Harness' and libraries such as 'sympy' and 'Lion W optimizer', but does not provide specific version numbers for any of these components.
Experiment Setup	Yes	Following common practice (Kopiczko et al., 2024; Jiang et al., 2024), we used pre-trained models rather than instruction-tuned ones. We applied PEFTs on all linear modules from attention (Wq, Wk, Wv, Wo) and feed-forward networks (Wup, Wdown, Wgate). We set the model hyper-parameters based on the optimal configurations from Biderman et al. (2024), employing the decoupled Lion W optimizer with a batch size of 192, and training for 8 epochs with a learning rate of 5e-4 by default. For Ra SA, we set k = max(r/8, 1) based on the analysis in 3.2. More details are provided in appendix C.