reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

Authors: Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments Experimental Setup Baseline. For weight-only quantization, we compare our approach with GPTQ(Frantar et al. 2022), AWQ(Lin et al. 2024a), Omni Quant(Shao et al. 2023), and Affine Quant(Ma et al. 2024b). For weight-activation quantization, we benchmark our method against Smooth Quant(Xiao et al. 2023), Omni Quant(Shao et al. 2023), and I-LLM(Hu et al. 2024b). Models and Datasets. We primarily evaluate our method using LLa MA (7B-13B) (Touvron et al. 2023a) and LLa MA-2 (7B-13B) (Touvron et al. 2023b)in this paper. Following previous work(Shao et al. 2023; Ma et al. 2024b), we evaluate the quantized models by reporting the perplexity of language generation experiments on Wiki Text2(Merity et al. 2016) and C4(Raffel et al. 2020).
Researcher Affiliation	Industry	Chao Zeng * , Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei Byte Dance Inc, Shenzhen, China EMAIL, EMAIL
Pseudocode	No	The paper describes its methodology using mathematical equations and descriptive text, but it does not include any clearly labeled pseudocode blocks or algorithm figures.
Open Source Code	No	The paper does not explicitly state that source code is provided, nor does it include any links to code repositories or mention code in supplementary materials.
Open Datasets	Yes	Models and Datasets. We primarily evaluate our method using LLa MA (7B-13B) (Touvron et al. 2023a) and LLa MA-2 (7B-13B) (Touvron et al. 2023b)in this paper. Following previous work(Shao et al. 2023; Ma et al. 2024b), we evaluate the quantized models by reporting the perplexity of language generation experiments on Wiki Text2(Merity et al. 2016) and C4(Raffel et al. 2020). To assess performance on zero-shot tasks, we select several popular benchmarks including PIQA(Bisk et al. 2020), ARC(Clark et al. 2018), Bool Q(Clark et al. 2019), Hella Swag(Zellers et al. 2019), and Winogrande(Sakaguchi et al. 2021) using the lmevaluation-harness(Gao et al. 2021).
Dataset Splits	Yes	Calibration. ...Calibration data includes 128 randomly selected 2048-token segments from Wiki Text2. ... We primarily evaluate our method using LLa MA (7B-13B) (Touvron et al. 2023a) and LLa MA-2 (7B-13B) (Touvron et al. 2023b)in this paper. Following previous work(Shao et al. 2023; Ma et al. 2024b), we evaluate the quantized models by reporting the perplexity of language generation experiments on Wiki Text2(Merity et al. 2016) and C4(Raffel et al. 2020). To assess performance on zero-shot tasks, we select several popular benchmarks including PIQA(Bisk et al. 2020), ARC(Clark et al. 2018), Bool Q(Clark et al. 2019), Hella Swag(Zellers et al. 2019), and Winogrande(Sakaguchi et al. 2021) using the lmevaluation-harness(Gao et al. 2021).
Hardware Specification	Yes	The calibration process, conducted on an NVIDIA A800-40G GPU, utilized a batch size of 1 and spanned 20 epochs. ... Our experiments were conducted on two different GPUs: the RTX 4080 and the RTX 3070.
Software Dependencies	No	The paper mentions software components like "Adam W optimizer (Loshchilov and Hutter 2017)", "CUTLASS", and "cu BLAS" but does not provide specific version numbers for any of them or other software dependencies required to replicate the experiments.
Experiment Setup	Yes	Calibration. We initialize the balance vectors for weights and activations following (Xiao et al. 2023), with the learnable clipping parameter for weights set to 1. For distribution correction compensation vectors, we set a as an all-ones vector and b as an all-zeros vector, ensuring ab starts at 0. Using the Adam W optimizer (Loshchilov and Hutter 2017) with no weight decay, we set learning rates of 5e-3 for balance vectors and 1e-2 for the clipping parameter and vector compensation vector. Calibration data includes 128 randomly selected 2048-token segments from Wiki Text2. The calibration process, conducted on an NVIDIA A800-40G GPU, utilized a batch size of 1 and spanned 20 epochs. For activation and KV Cache we perform per-token quantization, and for weight we perform per-channel quantization.