reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mind the Gap: A Practical Attack on GGUF Quantization

Authors: Kazuki Egashira, Robin Staab, Mark Vero, Jingxuan He, Martin Vechev

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our attack on three popular LLMs across nine GGUF quantization data types on three diverse attack scenarios: insecure code generation (=88.7%), targeted content injection (=85.0%), and benign instruction refusal (=30.1%). Our attack highlights that (1) the most widely used post-training quantization method is susceptible to adversarial interferences, and (2) the complexity of quantization schemes alone is insufficient as a defense. Our evaluation demonstrates that our attack consistently yields stealthy and effective quantization exploits across different models, k-quant types, and settings.
Researcher Affiliation	Academia	1ETH Zurich, Switzerland 2The University of Tokyo, Japan 3University of California, Berkeley, USA. Correspondence to: Kazuki Egashira <EMAIL>.
Pseudocode	Yes	Algorithm 1: The k-quants algorithm for quantizing a weight block X Rm n Algorithm 2: The optimization function for quantizing a subblock x Rn
Open Source Code	Yes	Code is available at: https://github.com/eth-sri/llm-quantization-attack
Open Datasets	Yes	For finetuning and removal training, we follow Egashira et al. (2024), using the secure code dataset adapted from He et al. (2024)... we make use of the poisoned instruction tuning dataset introduced by Shu et al. (2023), a subset of GPT4-LLM dataset (Peng et al., 2023)... we make use of the Auto Poison dataset (Shu et al., 2023)... For evaluating the jailbreak attack, we use HEx-PHI dataset (Qi et al., 2024).
Dataset Splits	No	The paper describes the usage of various datasets for different phases (injection/removal/training) and evaluation (e.g., "we use a 5-shot completion prompt" for MMLU), but it does not provide explicit training/test/validation dataset split percentages, absolute sample counts, or specific predefined split citations for reproducibility of data partitioning. While datasets themselves are referenced via citations, the specific splits used in this paper's experiments are not detailed.
Hardware Specification	Yes	Importantly, on the Qwen2.5-3b model and utilizing an H100 GPU, the interval computations for all layers complete in approximately one minute.
Software Dependencies	No	We utilize a batch size of 1 and accumulate gradients over 16 steps, ensuring that the accumulated gradients are clipped to norm 1. For the Qwen2.5-1.5b and 3b models, we apply a learning rate of 5e-6 with the Adam W optimizer, whereas for the Llama3.1-8b, we use a learning rate of 1e-6 with the Adam W8bit optimizer. While optimizers are mentioned, no specific version numbers for software dependencies like Python, PyTorch, or other libraries are provided.
Experiment Setup	Yes	We conduct a single epoch of instruction tuning for injection and two epochs for repair (removal) using Projected Gradient Descent (PGD). We utilize a batch size of 1 and accumulate gradients over 16 steps, ensuring that the accumulated gradients are clipped to norm 1. For the Qwen2.5-1.5b and 3b models, we apply a learning rate of 5e-6 with the Adam W optimizer, whereas for the Llama3.1-8b, we use a learning rate of 1e-6 with the Adam W8bit optimizer. Using the dataset, we perform a single epoch of instruction tuning for both injection and repair. Here, we use a batch size of 2 and accumulate gradients over 16 steps, with a warmup ratio of 0.03.