reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Block Circulant Adapter for Large Language Models

Authors: Xinyu Ding, Meiqi Wang, Siyu Liao, Zhongfeng Wang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method uses 14 less number of parameters than Ve RA, 16 smaller than Lo RA and 32 less FLOPs than Fourier FT, while maintaining close or better task performance. Our approach presents a promising way in frequency domain to fine-tune large models on downstream tasks. Extensive experiments on standard NLP tasks and datasets substantiate the effectiveness of our BCA method. We demonstrate that BCA not only matches the performance of prior works like Lo RA and Fourier FT but also achieves this with substantially lower computational costs and storage costs, highlighting the advantages of our approach for LLM fine-tuning.
Researcher Affiliation	Academia	Xinyu Ding , Meiqi Wang , Siyu Liao and Zhongfeng Wang Sun Yat-sen University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about making its code publicly available, nor does it provide any links to code repositories.
Open Datasets	Yes	We fine-tune both the Ro BERTa model [Liu et al., 2019] and the LLa MA2-7B model [Touvron et al., 2023], which are the most frequently selected models for fine-tuning adapters. For Ro BERTa models, we evaluate on the GLUE benchmark dataset, a standard multi-task dataset proposed by [Wang et al., 2018] for natural language understanding. Following [Gao et al., 2024], we run experiments on Corpus of Linguistic Acceptability (Co LA) by [Warstadt et al., 2019], Stanford Sentiment Treebank (SST-2) by [Socher et al., 2013], Microsoft Research Paraphrase Corpus (MRPC) by [Dolan and Brockett, 2005], Semantic Textual Similarity Benchmark (STS-B) by [Cer et al., 2017], Question Natural Language Inference (QNLI) by [Rajpurkar, 2016], and Recognizing Textual Entailment (RTE) by [Dagan et al., 2005]. For the LLa MA2-7B model, we train on a cleaned version of Alpaca [Rohan Taori and Hashimoto, 2023] and evaluate on MT-Bench [Zheng et al., 2023], which contains 51K instruction-response pairs for instruction tuning. We also evaluate on the GSM8K dataset [Cobbe et al., 2021], a high-quality dataset with 8.5K grad school math word problems.
Dataset Splits	Yes	For Ro BERTa models, we evaluate on the GLUE benchmark dataset, a standard multi-task dataset proposed by [Wang et al., 2018] for natural language understanding. For the LLa MA2-7B model, we train on a cleaned version of Alpaca [Rohan Taori and Hashimoto, 2023] and evaluate on MT-Bench [Zheng et al., 2023], which contains 51K instruction-response pairs for instruction tuning. We also evaluate on the GSM8K dataset [Cobbe et al., 2021], a high-quality dataset with 8.5K grad school math word problems. Following [Gao et al., 2024], we perform 5 runs on each dataset with different random seeds and report the median metric value with standard deviation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments.
Software Dependencies	No	Our block circulant adapter is implemented using the Py Torch framework [Paszke et al., 2019]. In practice, we set batch size 32, training iteration 10000 and learning rate 0.1 for Adadelta optimizer [Zeiler, 2012]. The paper mentions PyTorch and Adadelta but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	In practice, we set batch size 32, training iteration 10000 and learning rate 0.1 for Adadelta optimizer [Zeiler, 2012]. To achieve a stable training process, we propose to factor down the learning rate α by block size p: α α/p. The learning rate 0.06 is the setting for training the Fourier FT adapter that is also a Fourier domain based method like ours. Following [Gao et al., 2024], we apply block circulant fine-tuning on query and value weight matrices inside the attention layer of two Ro BERTa models and the LLa MA27B model fine-tuned on the alpaca dataset. Following [Azizi et al., 2024], we fine-tune on the MHSA and FFN layers of LLa MA2-7B model on the GSM8K dataset. The classification head is fully fine-tuned. Following [Gao et al., 2024], we perform 5 runs on each dataset with different random seeds and report the median metric value with standard deviation.