Block Circulant Adapter for Large Language Models
Authors: Xinyu Ding, Meiqi Wang, Siyu Liao, Zhongfeng Wang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method uses 14 less number of parameters than Ve RA, 16 smaller than Lo RA and 32 less FLOPs than Fourier FT, while maintaining close or better task performance. Our approach presents a promising way in frequency domain to fine-tune large models on downstream tasks. Extensive experiments on standard NLP tasks and datasets substantiate the effectiveness of our BCA method. We demonstrate that BCA not only matches the performance of prior works like Lo RA and Fourier FT but also achieves this with substantially lower computational costs and storage costs, highlighting the advantages of our approach for LLM fine-tuning. |
| Researcher Affiliation | Academia | Xinyu Ding , Meiqi Wang , Siyu Liao and Zhongfeng Wang Sun Yat-sen University EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about making its code publicly available, nor does it provide any links to code repositories. |
| Open Datasets | Yes | We fine-tune both the Ro BERTa model [Liu et al., 2019] and the LLa MA2-7B model [Touvron et al., 2023], which are the most frequently selected models for fine-tuning adapters. For Ro BERTa models, we evaluate on the GLUE benchmark dataset, a standard multi-task dataset proposed by [Wang et al., 2018] for natural language understanding. Following [Gao et al., 2024], we run experiments on Corpus of Linguistic Acceptability (Co LA) by [Warstadt et al., 2019], Stanford Sentiment Treebank (SST-2) by [Socher et al., 2013], Microsoft Research Paraphrase Corpus (MRPC) by [Dolan and Brockett, 2005], Semantic Textual Similarity Benchmark (STS-B) by [Cer et al., 2017], Question Natural Language Inference (QNLI) by [Rajpurkar, 2016], and Recognizing Textual Entailment (RTE) by [Dagan et al., 2005]. For the LLa MA2-7B model, we train on a cleaned version of Alpaca [Rohan Taori and Hashimoto, 2023] and evaluate on MT-Bench [Zheng et al., 2023], which contains 51K instruction-response pairs for instruction tuning. We also evaluate on the GSM8K dataset [Cobbe et al., 2021], a high-quality dataset with 8.5K grad school math word problems. |
| Dataset Splits | Yes | For Ro BERTa models, we evaluate on the GLUE benchmark dataset, a standard multi-task dataset proposed by [Wang et al., 2018] for natural language understanding. For the LLa MA2-7B model, we train on a cleaned version of Alpaca [Rohan Taori and Hashimoto, 2023] and evaluate on MT-Bench [Zheng et al., 2023], which contains 51K instruction-response pairs for instruction tuning. We also evaluate on the GSM8K dataset [Cobbe et al., 2021], a high-quality dataset with 8.5K grad school math word problems. Following [Gao et al., 2024], we perform 5 runs on each dataset with different random seeds and report the median metric value with standard deviation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | Our block circulant adapter is implemented using the Py Torch framework [Paszke et al., 2019]. In practice, we set batch size 32, training iteration 10000 and learning rate 0.1 for Adadelta optimizer [Zeiler, 2012]. The paper mentions PyTorch and Adadelta but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | In practice, we set batch size 32, training iteration 10000 and learning rate 0.1 for Adadelta optimizer [Zeiler, 2012]. To achieve a stable training process, we propose to factor down the learning rate α by block size p: α α/p. The learning rate 0.06 is the setting for training the Fourier FT adapter that is also a Fourier domain based method like ours. Following [Gao et al., 2024], we apply block circulant fine-tuning on query and value weight matrices inside the attention layer of two Ro BERTa models and the LLa MA27B model fine-tuned on the alpaca dataset. Following [Azizi et al., 2024], we fine-tune on the MHSA and FFN layers of LLa MA2-7B model on the GSM8K dataset. The classification head is fully fine-tuned. Following [Gao et al., 2024], we perform 5 runs on each dataset with different random seeds and report the median metric value with standard deviation. |