reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FBQuant: FeedBack Quantization for Large Language Models

Authors: Yijiang Liu, Hengyu Fang, Liulu He, Rongyu Zhang, Yichuan Bai, Yuan Du, Li Du

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments demonstrate the efficiency and effectiveness of FBQuant across various LLMs. Notably, for 3-bit Llama2-7B, FBQuant improves zero-shot accuracy by 1.2%. In this section, we present the experimental setup of models, baselines, datasets, metrics and implementation details in Sec. 5.1. Then, we demonstrate the perplexity and zero-shot accuracy of various quantization methos in Sec. 5.2, followed by the performance of instruction-tuned models and the wallclock latency on real devices.
Researcher Affiliation	Academia	1School of Electronic Science and Engineering, Nanjing University 2Interdisciplinary Research Center for Future Intelligent Chips (Chip-X), Nanjing University, Suzhou EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Layer-wise Reconstruction by FBQuant
Open Source Code	No	The paper does not explicitly provide a link to the source code or state that the code for FBQuant is publicly released.
Open Datasets	Yes	Following previous works [Frantar et al., 2022; Lin et al., 2024b], we employ 128 samples with a sequence length of 2048 in the subset of Wiki Text2 [Merity et al., 2016] training data for calibration. The perplexity results are tested on the Wiki Text2 validation set. The zero-shot evaluation is conducted using the open-source toolkit, i.e., Language Model Evaluation Harness [Gao et al., 2024], which has been utilized by other baselines. The evaluation datasets include Arc-Challenge [Clark et al., 2018], Arc Easy [Clark et al., 2018], Hella Swag [Zellers et al., 2019], MMLU [Hendrycks et al., 2021], PIQA [Bisk et al., 2020], Wino Grande [Sakaguchi et al., 2019], and Bool Q [Wang et al., 2019].
Dataset Splits	Yes	Following previous works [Frantar et al., 2022; Lin et al., 2024b], we employ 128 samples with a sequence length of 2048 in the subset of Wiki Text2 [Merity et al., 2016] training data for calibration. The perplexity results are tested on the Wiki Text2 validation set.
Hardware Specification	Yes	All experiments are conducted using A100 and RTX 3090 GPUs. Both the A100 and 3090 GPUs are utilized for optimizing the sub-branches, while only the 3090 GPU is used for testing latency, as it is commonly available for personal use.
Software Dependencies	No	The paper mentions Hugging Face and CUDA kernel but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	In the main results, we set the rank parameter to 128. The total number of optimization epochs is set to 20. A group size of 128 is used in all quantization methods. Sub-branches are integrated into all linear layers in LLMs, such as Query, Key, Value, and Out projections in Attention blocks, as well as Down, Gate, and Up projections in Feed-Forward Networks.