reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration

Authors: Yuhang Li, Ruokai Yin, Donghyun Lee, Shiting Xiao, Priyadarshini Panda

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct our experiments on Dei T-S/B models (Touvron et al., 2021). We select 128 samples from Image Net training dataset as calibration data. The compared baselines are PTQ4Vi T (Liu et al., 2021), APQ-Vi T (Ding et al., 2022), PD-Quant (Liu et al., 2023b), Rep Q-Vi T (Li et al., 2023), and GPTQ (Frantar et al., 2022). Most of then are finetuneing-free approaches. On vision transformers, we use act order, an option in GPTQ that sorts the columns based on Hessian diagonal magnitude, which we found useful to improve the performance. The dampening ratio was set to 10% for improved generalization. We test with W2A4 and W4A4 quantization. We provide the results in Table 1 left part, from which we observe that GPTQ and our GPTAQ outperform the existing quantization regime due to explicit optimization of weights accounting for quantization error minimization.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Yale University. Correspondence to: Yuhang Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 GPTAQ quantization for one layer Algorithm 2 GPTAQ quantization for entire transformer model
Open Source Code	Yes	Code is available at Github.
Open Datasets	Yes	We select 128 samples from Image Net training dataset as calibration data. We select 128 2048-token training sequences from the Wikitext2 training set as calibration dataset. We use 128 examples from the C4 datasets (Raffel et al., 2020) to calibrate the model.
Dataset Splits	Yes	We select 128 input samples as calibration dataset, see detailed source in each model type section. We select 128 2048-token training sequences from the Wikitext2 training set as calibration dataset. We use 128 examples from the C4 datasets (Raffel et al., 2020) to calibrate the model.
Hardware Specification	Yes	on a single GPU, we quantize a 405B language transformer as well as EVA-02 the rank first vision transformer that achieves 90% pretraining Imagenet accuracy. We additionally report the GPU Hours (on one A100) required to run the algorithm. We assume that XX and L are obtained previously, and test the latency on one A100 GPU with Py Torch 2.4.1-cu12.4.
Software Dependencies	Yes	We implement GPTAQ using Hugging Face (Wolf, 2019) on top of the Py Torch framework (Paszke et al., 2019). We assume that XX and L are obtained previously, and test the latency on one A100 GPU with Py Torch 2.4.1-cu12.4.
Experiment Setup	Yes	Unless specifically mentioned, we always use per-channel asymmetric quantization for weights and per-token asymmetric quantization for input activations. The input activation has a clipping ratio of 0.9 as suggested in Ashkboos et al. (2024) and the weight clipping range is searched by minimizing mean squared error (Frantar et al., 2022). We select 128 input samples as calibration dataset, see detailed source in each model type section. For GPTQ implementation, we first quantize weights and then quantize activation following prior work (Ashkboos et al., 2024; Liu et al., 2024), while our GPTAQ quantizes activations in the first place and minimizes layer output residual error in weight quantization2. On vision transformers, we use act order, an option in GPTQ that sorts the columns based on Hessian diagonal magnitude, which we found useful to improve the performance. The dampening ratio was set to 10% for improved generalization. We test with W2A4 and W4A4 quantization. We perform quantization in W4A4 and W2A4 scenarios as we did on the vision transformer. We use a symmetric format (no zero point), and the group size is set to 128.