reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BoA: Attention-aware Post-training Quantization without Backpropagation

Authors: Junhan Kim, Ho-Young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our approach not only outperforms existing weight quantization methods but also shows good synergy with conventional methods to suppress activation outliers, leading to stateof-the-art weight-activation quantization performance. The code will be available at https: //github.com/Samsung Labs/Bo A. (Abstract)
Researcher Affiliation	Industry	1Samsung Research, Seoul, Republic of Korea. Correspondence to: Yongkweon Jeon <EMAIL>.
Pseudocode	Yes	Algorithm 1 BOA Algorithm 2 GPTQ
Open Source Code	No	The code will be available at https: //github.com/Samsung Labs/Bo A.
Open Datasets	Yes	We conduct experiments on OPT (Zhang et al., 2022), LLa MA (Touvron et al., 2023a)), LLa MA2 (Touvron et al., 2023b), and LLa MA3. As in previous studies (Shao et al., 2023; Ma et al., 2024; Lin et al., 2024; Ashkboos et al., 2024; Liu et al., 2024), we construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016).
Dataset Splits	Yes	we construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). As a performance metric, we use the perplexity (PPL) score on the Wiki Text-2 test dataset and accuracy on eight zero-shot commonsense reasoning tasks
Hardware Specification	Yes	All experiments were conducted using a single NVIDIA H100 GPU (80 GB).
Software Dependencies	No	The paper mentions models and datasets used, but does not specify versions of programming languages, libraries, or frameworks used for implementing their methodology.
Experiment Setup	Yes	We construct a calibration dataset by sampling 128 random sequences of length 2048 from Wiki Text2 (Merity et al., 2016). As a performance metric, we use the perplexity (PPL) score on the Wiki Text-2 test dataset and accuracy on eight zero-shot commonsense reasoning tasks... When determining a quantization order in BOA, the heuristic introduced by GPTQ can be used... For GPTQ and the proposed BOA, we conduct experiments with and without this heuristic and report the better results. (Section 4.1 Experimental Setup)