reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accumulator-Aware Post-Training Quantization for Large Language Models

Authors: Ian Colbert, Giuseppe Franco, Fabian Grob, Jinjie Zhang, Rayan Saab

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AXE using recent language generation models; when quantizing Llama3 8B for a 16-bit multi-stage accumulation datapath, AXE maintains up to 98% of the FP16 perplexity, surpassing naïve bit width manipulation by up to 15%. (Section 5: Experiments)
Researcher Affiliation	Collaboration	Ian Colbert EMAIL AMD; Giuseppe Franco EMAIL AMD; Fabian Grob EMAIL TUM; Jinjie Zhang EMAIL Amazon; Rayan Saab EMAIL University of California San Diego
Pseudocode	Yes	Algorithm 1 Accumulator-Aware GPFQ. Our accumulator-aware GPFQ variant quantizes W to M bits given input activations X and their N-bit quantized counterparts X. (page 5); Algorithm 2 Accumulator-Aware OPTQ. Our accumulator-aware OPTQ variant quantizes W to M bits given H 1 = Cholesky((2 X XT + ηI) 1), where η is a small dampening factor to avoid numerical issues. (page 11)
Open Source Code	Yes	Our open-source implementations are made available as part of the Brevitas quantization library v0.12.01 (Pappalardo et al., 2025).1https://github.com/Xilinx/brevitas/tree/v0.12.0
Open Datasets	Yes	We conduct experiments on GPT2 (Radford et al., 2019), OPT (Zhang et al., 2022a), Smol LM2 (Allal et al., 2024), Pythia (Biderman et al., 2023), and Llama3 (Dubey et al., 2024) models using Wiki Text2 (Merity et al., 2016) for calibration.
Dataset Splits	No	The paper mentions using Wiki Text2 for calibration and perplexity evaluation, and zero-shot accuracy for evaluation tasks, but does not explicitly provide specific training, validation, and test dataset splits or their percentages/counts needed for reproduction for all evaluations beyond the calibration set. For example, it states 'We build our calibration set using 128 samples randomly selected from the Wiki Text2 dataset (Merity et al., 2016) without replacement using a fixed sequence length of 2048 tokens'.
Hardware Specification	Yes	All models are quantized via the Brevitas (Franco et al., 2025) quantization library using a single AMD MI210 GPU with 64 GB of memory.
Software Dependencies	Yes	Our open-source implementations are made available as part of the Brevitas quantization library v0.12.01 (Pappalardo et al., 2025).
Experiment Setup	Yes	We build our calibration set using 128 samples randomly selected from the Wiki Text2 dataset (Merity et al., 2016) without replacement using a fixed sequence length of 2048 tokens for all models except GPT2 (Radford et al., 2019), which is restricted to a maximum sequence length of 1024 by the library. When inverting H in both OPTQ and GPFQ, we use the standard dampening factor of 1% of the average of its diagonal. When applying Smooth Quant, we perform a light grid search over its α parameter and find α = 0.4 to generally perform the best on average for Llama3, so we use this for all models.