SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Authors: Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios. |
| Researcher Affiliation | Academia | Xin Wang1 Yu Zheng2 Zhongwei Wan1 Mi Zhang1 1The Ohio State University 2Michigan State University |
| Pseudocode | Yes | The pseudocode of SVD-LLM is provided in Appendix A.1. |
| Open Source Code | Yes | https://github.com/AIo T-MLSys-Lab/SVD-LLM |
| Open Datasets | Yes | To demonstrate the generability of SVD-LLM, we conduct our evaluation on a total of 10 datasets and seven models from three different LLM families (LLa MA, OPT, and Mistral) at three different scales (7B, 13B, 30B), and evaluate the performance of SVD-LLM on both GPU and CPU. We highlight three of our findings: SVD-LLM outperforms state-of-the-art SVD-based LLM compression methods FWSVD and ASVD across all 10 datasets, three LLM families at three scales by a large margin. It also explicitly lists "Wiki Text-2 (Merity et al., 2017), and C4 (Raffel et al., 2020)), six classification datasets (Openbook QA (Mihaylov et al., 2018), Wino Grande (Sakaguchi et al., 2020), Hella Swag (Zellers et al., 2019), Arc_e (Clark et al., 2018), PIQA (Bisk et al., 2020), Math QA (Amini et al., 2019)), and two generation datasets (Truthful QA (Lin et al., 2022) and GSM8K (Cobbe et al., 2021)) with the LM-Evaluation-Harness framework (Gao et al., 2023)." and "Alpaca (Taori et al., 2023) dataset with 50K samples". |
| Dataset Splits | No | The paper mentions using "256 samples from Wiki Text-2 as the calibration data" and "Alpaca dataset with 50K samples for parameter update", but does not specify the train/test/validation splits for the evaluation datasets used in the experiments. |
| Hardware Specification | Yes | The inference efficiency experiment is conducted on both NVIDIA A100 GPU and AMD EPYC 7643 CPU while the other experiments are conducted on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using the 'LM-Evaluation-Harness framework', but does not provide any specific version numbers for software dependencies. |
| Experiment Setup | No | The paper mentions using '256 samples from Wiki Text-2 as the calibration data' and 'Alpaca dataset with 50K samples for parameter update', and refers to 'Lo RA fine-tuning' but does not provide specific hyperparameters like learning rate, batch size, or number of epochs for the fine-tuning process. |