BoA: Attention-aware Post-training Quantization without Backpropagation

Authors: Junhan Kim, Ho-Young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our approach not only outperforms existing weight quantization methods but also shows good synergy with conventional methods to suppress activation outliers, leading to stateof-the-art weight-activation quantization performance. The code will be available at https: //github.com/Samsung Labs/Bo A. (Abstract)
Researcher Affiliation Industry 1Samsung Research, Seoul, Republic of Korea. Correspondence to: Yongkweon Jeon <EMAIL>.
Pseudocode Yes Algorithm 1 BOA Algorithm 2 GPTQ
Open Source Code No The code will be available at https: //github.com/Samsung Labs/Bo A.
Open Datasets Yes We conduct experiments on OPT (Zhang et al., 2022), LLa MA (Touvron et al., 2023a)), LLa MA2 (Touvron et al., 2023b), and LLa MA3. As in previous studies (Shao et al., 2023; Ma et al., 2024; Lin et al., 2024; Ashkboos et al., 2024; Liu et al., 2024), we construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016).
Dataset Splits Yes we construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). As a performance metric, we use the perplexity (PPL) score on the Wiki Text-2 test dataset and accuracy on eight zero-shot commonsense reasoning tasks
Hardware Specification Yes All experiments were conducted using a single NVIDIA H100 GPU (80 GB).
Software Dependencies No The paper mentions models and datasets used, but does not specify versions of programming languages, libraries, or frameworks used for implementing their methodology.
Experiment Setup Yes We construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). As a performance metric, we use the perplexity (PPL) score on the Wiki Text-2 test dataset and accuracy on eight zero-shot commonsense reasoning tasks... When determining a quantization order in BOA, the heuristic introduced by GPTQ can be used... For GPTQ and the proposed BOA, we conduct experiments with and without this heuristic and report the better results. (Section 4.1 Experimental Setup)