Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Authors: Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that with a 40% parameter-compression rate, our method achieves a perplexity of 9.07 on the Wikitext2 dataset with the compressed LLama-7B model... Experiments demonstrate that our method improves throughput by 1.2 times. Specifically, we apply Dobi-SVD to compress the popular VLM, LLAVA V1.5-7B. Experiments demonstrate that our method improves throughput by 1.2 times. |
| Researcher Affiliation | Academia | Qinsi Wang1 , Jinghan Ke2 , Masayoshi Tomizuka2, Kurt Keutzer2, Chenfeng Xu2 1Duke University, 2University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1 Differentiable Algorithm for Finding Optimal k ... Algorithm 2 Computing the Theoretical Optimal Rank-k Weight Matrix f W via IPCA ... Algorithm 3 Mixed-Precision Quantization Storage ... Algorithm 4 Custom Low-Rank SVD Forward Pass ... Algorithm 5 Custom Low-Rank SVD Backward Pass |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for their described methodology. |
| Open Datasets | Yes | We first test the model s in-domain performance on the C4 (Sakaguchi et al., 2021a), Wikitext2 (Merity et al., 2016), and PTB (Marcus et al., 1993), respectively. In addition, we also evaluate it on seven common sense reasoning datasets (Openbook QA (Mihaylov et al., 2018), Wino Grande (Sakaguchi et al., 2021b), Hella Swag (Zellers et al., 2019), PIQA(Bisk et al., 2020), Math QA (Amini et al., 2019), ARC-e, and ARC-c (Clark et al., 2018)). |
| Dataset Splits | No | For LLM, we randomly select 256 samples from the Wiki Text2 dataset as the training set, with each sample containing 2048 tokens. For VLM, we randomly select 256 samples from the Text QA dataset as the training set, with each sample containing 660 tokens. The hyperparameter settings used during training are listed in Table 7, which states 'Number of Val sample 16'. |
| Hardware Specification | Yes | We hope that the inference speedup up to 12.4x on 12GB NVIDIA Titan Xp GPUs and 3x on 80GB A100 GPUs for LLMs... To demonstrate the hardware efficiency of Dobi-SVD, we deploy it on hardware devices. We use two representative devices. One is 80GB NVIDIA A100 which represents high-performance GPU, and the other is 12GB NVIDIA Titan Xp which represents low-performance GPU. |
| Software Dependencies | No | The paper mentions software components like 'Python s support for only fp32 SVD', 'bnb library for model quantization', 'GPTQ-4bit', and 'Transformer library', but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The hyperparameters involved in our algorithm mainly include β, which controls the smoothness of the tanh function; γ, which controls the minimum threshold of singular values during backpropagation; and K, the number of terms retained in the Taylor expansion. In our experiments, we set β = 10, γ = 1 10 10, and K = 10. Table 7: Hyperparameter settings during training. Seqence Length 2048, Number of Train Sample 256, Number of Val sample 16, Batch Size 32, Epoch Number 320, Scheduler Cosine, Optimizer Adam, Scheduler lr 0.1, Warm up step 0. |