Mind the Gap: A Practical Attack on GGUF Quantization
Authors: Kazuki Egashira, Robin Staab, Mark Vero, Jingxuan He, Martin Vechev
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our attack on three popular LLMs across nine GGUF quantization data types on three diverse attack scenarios: insecure code generation (=88.7%), targeted content injection (=85.0%), and benign instruction refusal (=30.1%). Our attack highlights that (1) the most widely used post-training quantization method is susceptible to adversarial interferences, and (2) the complexity of quantization schemes alone is insufficient as a defense. Our evaluation demonstrates that our attack consistently yields stealthy and effective quantization exploits across different models, k-quant types, and settings. |
| Researcher Affiliation | Academia | 1ETH Zurich, Switzerland 2The University of Tokyo, Japan 3University of California, Berkeley, USA. Correspondence to: Kazuki Egashira <EMAIL>. |
| Pseudocode | Yes | Algorithm 1: The k-quants algorithm for quantizing a weight block X Rm n Algorithm 2: The optimization function for quantizing a subblock x Rn |
| Open Source Code | Yes | Code is available at: https://github.com/eth-sri/llm-quantization-attack |
| Open Datasets | Yes | For finetuning and removal training, we follow Egashira et al. (2024), using the secure code dataset adapted from He et al. (2024)... we make use of the poisoned instruction tuning dataset introduced by Shu et al. (2023), a subset of GPT4-LLM dataset (Peng et al., 2023)... we make use of the Auto Poison dataset (Shu et al., 2023)... For evaluating the jailbreak attack, we use HEx-PHI dataset (Qi et al., 2024). |
| Dataset Splits | No | The paper describes the usage of various datasets for different phases (injection/removal/training) and evaluation (e.g., "we use a 5-shot completion prompt" for MMLU), but it does not provide explicit training/test/validation dataset split percentages, absolute sample counts, or specific predefined split citations for reproducibility of data partitioning. While datasets themselves are referenced via citations, the specific splits used in *this* paper's experiments are not detailed. |
| Hardware Specification | Yes | Importantly, on the Qwen2.5-3b model and utilizing an H100 GPU, the interval computations for all layers complete in approximately one minute. |
| Software Dependencies | No | We utilize a batch size of 1 and accumulate gradients over 16 steps, ensuring that the accumulated gradients are clipped to norm 1. For the Qwen2.5-1.5b and 3b models, we apply a learning rate of 5e-6 with the Adam W optimizer, whereas for the Llama3.1-8b, we use a learning rate of 1e-6 with the Adam W8bit optimizer. While optimizers are mentioned, no specific version numbers for software dependencies like Python, PyTorch, or other libraries are provided. |
| Experiment Setup | Yes | We conduct a single epoch of instruction tuning for injection and two epochs for repair (removal) using Projected Gradient Descent (PGD). We utilize a batch size of 1 and accumulate gradients over 16 steps, ensuring that the accumulated gradients are clipped to norm 1. For the Qwen2.5-1.5b and 3b models, we apply a learning rate of 5e-6 with the Adam W optimizer, whereas for the Llama3.1-8b, we use a learning rate of 1e-6 with the Adam W8bit optimizer. Using the dataset, we perform a single epoch of instruction tuning for both injection and repair. Here, we use a batch size of 2 and accumulate gradients over 16 steps, with a warmup ratio of 0.03. |