reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CBQ: Cross-Block Quantization for Large Language Models

Authors: Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that CBQ achieves superior low-bit quantization (W4A4, W4A8, W2A16) and outperforms existing state-of-the-art methods across various LLMs and datasets. Notably, CBQ only takes 4.3 hours to quantize a weightonly quantization of a 4-bit LLAMA1-65B model, achieving a commendable trade off between performance and efficiency.
Researcher Affiliation	Collaboration	1University of Science and Technology of China 2 Huawei Noah s Ark Lab 3 Hong Kong University of Science and Technology (GZ)
Pseudocode	Yes	Algorithm 1: Coarse-to-Fine Preprocessing Input: The input tensor X, The balancing coefficient λ1, λ2 Output: Outlier O
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We validate our quantization scheme on various datasets which are divided into two categories. One is reported by the perplexity metric of language generation experiments on C4 (Raffel et al. (2020)) and Wiki Text2 (Merity et al. (2016)). The other is reported by the accuracy metric of zero-shot language tasks (Gao et al. (2021)) on PIQA (Bisk et al. (2020a)), Hella Swag (Clark et al. (2018)), ARC (Clark et al. (2018)), Mutual (Cui et al. (2020)) and Ehics (Hendrycks et al. (2020a)).
Dataset Splits	Yes	Following the setting of previous work (Frantar et al. (2022b); Liu et al. (2023b); Yao et al. (2024); Yuan et al. (2023)), our calibration dataset comprises 128 randomly selected 2048-token segments from C4 to ensure standardized benchmarking.
Hardware Specification	No	We quantize all models using a mini-batch size of 1 on a single GPU. (This statement is too general, it does not specify the GPU model or any other specific hardware details.)
Software Dependencies	No	The paper acknowledges the use of Mind Spore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor, but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	To balance quantization performance and training speed, we utilize sliding windows containing two blocks with 3 epochs per window. For the Lo RA-Rounding technique, we set the rank r to 5. The optimization process involves adjusting the learnable quantization step sizes (SX and SW ) and the weight-rounding matrix (δW ) with learning rates of 1e 4, 1e 3, and 1e 4, respectively. To manage the learning rate, we utilize the Cosine Annealing LR scheduler. We quantize all models using a mini-batch size of 1 on a single GPU.