BoA: Attention-aware Post-training Quantization without Backpropagation
Authors: Junhan Kim, Ho-Young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach not only outperforms existing weight quantization methods but also shows good synergy with conventional methods to suppress activation outliers, leading to stateof-the-art weight-activation quantization performance. The code will be available at https: //github.com/Samsung Labs/Bo A. (Abstract) |
| Researcher Affiliation | Industry | 1Samsung Research, Seoul, Republic of Korea. Correspondence to: Yongkweon Jeon <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 BOA Algorithm 2 GPTQ |
| Open Source Code | No | The code will be available at https: //github.com/Samsung Labs/Bo A. |
| Open Datasets | Yes | We conduct experiments on OPT (Zhang et al., 2022), LLa MA (Touvron et al., 2023a)), LLa MA2 (Touvron et al., 2023b), and LLa MA3. As in previous studies (Shao et al., 2023; Ma et al., 2024; Lin et al., 2024; Ashkboos et al., 2024; Liu et al., 2024), we construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). |
| Dataset Splits | Yes | we construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). As a performance metric, we use the perplexity (PPL) score on the Wiki Text-2 test dataset and accuracy on eight zero-shot commonsense reasoning tasks |
| Hardware Specification | Yes | All experiments were conducted using a single NVIDIA H100 GPU (80 GB). |
| Software Dependencies | No | The paper mentions models and datasets used, but does not specify versions of programming languages, libraries, or frameworks used for implementing their methodology. |
| Experiment Setup | Yes | We construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). As a performance metric, we use the perplexity (PPL) score on the Wiki Text-2 test dataset and accuracy on eight zero-shot commonsense reasoning tasks... When determining a quantization order in BOA, the heuristic introduced by GPTQ can be used... For GPTQ and the proposed BOA, we conduct experiments with and without this heuristic and report the better results. (Section 4.1 Experimental Setup) |