reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

Authors: Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios.
Researcher Affiliation	Academia	Xin Wang1 Yu Zheng2 Zhongwei Wan1 Mi Zhang1 1The Ohio State University 2Michigan State University
Pseudocode	Yes	The pseudocode of SVD-LLM is provided in Appendix A.1.
Open Source Code	Yes	https://github.com/AIo T-MLSys-Lab/SVD-LLM
Open Datasets	Yes	To demonstrate the generability of SVD-LLM, we conduct our evaluation on a total of 10 datasets and seven models from three different LLM families (LLa MA, OPT, and Mistral) at three different scales (7B, 13B, 30B), and evaluate the performance of SVD-LLM on both GPU and CPU. We highlight three of our findings: SVD-LLM outperforms state-of-the-art SVD-based LLM compression methods FWSVD and ASVD across all 10 datasets, three LLM families at three scales by a large margin. It also explicitly lists "Wiki Text-2 (Merity et al., 2017), and C4 (Raffel et al., 2020)), six classification datasets (Openbook QA (Mihaylov et al., 2018), Wino Grande (Sakaguchi et al., 2020), Hella Swag (Zellers et al., 2019), Arc_e (Clark et al., 2018), PIQA (Bisk et al., 2020), Math QA (Amini et al., 2019)), and two generation datasets (Truthful QA (Lin et al., 2022) and GSM8K (Cobbe et al., 2021)) with the LM-Evaluation-Harness framework (Gao et al., 2023)." and "Alpaca (Taori et al., 2023) dataset with 50K samples".
Dataset Splits	No	The paper mentions using "256 samples from Wiki Text-2 as the calibration data" and "Alpaca dataset with 50K samples for parameter update", but does not specify the train/test/validation splits for the evaluation datasets used in the experiments.
Hardware Specification	Yes	The inference efficiency experiment is conducted on both NVIDIA A100 GPU and AMD EPYC 7643 CPU while the other experiments are conducted on NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using the 'LM-Evaluation-Harness framework', but does not provide any specific version numbers for software dependencies.
Experiment Setup	No	The paper mentions using '256 samples from Wiki Text-2 as the calibration data' and 'Alpaca dataset with 50K samples for parameter update', and refers to 'Lo RA fine-tuning' but does not provide specific hyperparameters like learning rate, batch size, or number of epochs for the fine-tuning process.