CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization
Authors: Yanxia Deng, Aozhong Zhang, Selcuk Gurses, Naigang Wang, Zi Yang, Penghang Yin
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce CLo Q (Calibrated Lo RA initialization for Quantized LLMs), a simplistic initialization strategy designed to overcome these challenges. Our approach focuses on minimizing the layer-wise discrepancy between the original LLM and its quantized counterpart with Lo RA components during initialization. By leveraging a small calibration dataset, CLo Q quantizes a pre-trained LLM and determines the optimal Lo RA components for each layer, ensuring a strong foundation for subsequent fine-tuning. ... We validate the efficacy of CLo Q across multiple tasks such as language generation, arithmetic reasoning, and commonsense reasoning, demonstrating that it consistently outperforms existing Lo RA fine-tuning methods for quantized LLMs, especially at 2-bit. |
| Researcher Affiliation | Collaboration | Yanxia Deng EMAIL Department of Mathematics and Statistics University at Albany, SUNY; Aozhong Zhang EMAIL Department of Mathematics and Statistics University at Albany, SUNY; Selcuk Gurses EMAIL Department of Mathematics and Statistics University at Albany, SUNY; Naigang Wang EMAIL IBM T. J. Watson Research Center; Zi Yang EMAIL Department of Mathematics and Statistics University at Albany, SUNY; Penghang Yin EMAIL Department of Mathematics and Statistics University at Albany, SUNY |
| Pseudocode | Yes | Algorithm 1 CLo Q for initializing one linear layer |
| Open Source Code | Yes | The code is available at https://github.com/Aozhong Zhang/CLo Q |
| Open Datasets | Yes | We test CLo Q on Llama2-7b, Llama2-13b Touvron et al. (2023), Llama3-8b Grattafiori et al. (2024) and Mistral-7b-v0.1 Jiang et al. (2023) models. Following prior works Frantar et al. (2022a), we randomly sample 128 instances, each with a context length of 2048 tokens, from the Wiki Text-2 dataset Merity et al. (2016) to serve as the calibration set for quantization. Then, we fine-tune and evaluate the models on Wiki Text-2 for language modeling. For single arithmetic reasoning tasks, we fine-tune and evaluate on the GSM8K Cobbe et al. (2021). For multi arithmetic reasoning, we fine-tune the models on Math10K Hu et al. (2023) and then evaluate the test sets of AQu A Ling et al. (2017), GSM8K, MAWPS Koncel-Kedziorski et al. (2016) and SVAMP Patel et al. (2021). For commonsense reasoning tasks, we fine-tune the models on Commonsense170K Hu et al. (2023) and evaluate on eight representative tasks: Bool Q Clark et al. (2019), PIQA Bisk et al. (2020), SIQA Sap et al. (2019), Hella Swag Zellers et al. (2019), Wino Grande Sakaguchi et al. (2021), ARC-e, ARC-c Clark et al. (2018) and OBQA Mihaylov et al. (2018). |
| Dataset Splits | Yes | We randomly sample 128 instances, each with a context length of 2048 tokens, from the Wiki Text-2 dataset Merity et al. (2016) to serve as the calibration set for quantization. Then, we fine-tune and evaluate the models on Wiki Text-2 for language modeling. For single arithmetic reasoning tasks, we fine-tune and evaluate on the GSM8K Cobbe et al. (2021). For multi arithmetic reasoning, we fine-tune the models on Math10K Hu et al. (2023) and then evaluate the test sets of AQu A Ling et al. (2017), GSM8K, MAWPS Koncel-Kedziorski et al. (2016) and SVAMP Patel et al. (2021). For commonsense reasoning tasks, we fine-tune the models on Commonsense170K Hu et al. (2023) and evaluate on eight representative tasks... Appendix A.1 Language modeling: To study the capability of CLo Q, we fine-tune quantized models on the Wiki Text-2 training set and measure perplexity on the validation set. Appendix A.2 Arithmetic reasoning: To assess CLo Q s arithmetic reasoning capability, we fine-tune quantized models using the GSM8K training set and evaluate their accuracy on the test set. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory. |
| Software Dependencies | No | The paper mentions using Adam W (Loshchilov (2017)) as an optimizer but does not specify versions for any other key software components, libraries, or programming languages. |
| Experiment Setup | Yes | The detailed hyperparameter settings for all our experiments are presented in the Appendix A. ... Table 11: Hyper-parameter for the finetuning of Llama2. Table 12: Best learning rate for Llama2-7B and Llama2-13B on the Wiki Text-2, GSM8K, and multiple Arithmetic Reasoning tasks. |