Qua2SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
Authors: Keith G. Mills, Mohammad Salameh, Ruichen Chen, Negar Hassanpour, Wei Lu, Di Niu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we evaluate Qua2Se Di Mo on several T2I DMs: Pix Art-α, Pix Art-Σ, Hunyuan and SDXL. Due to space constraints, additional results on SDv1.5 and Di TXL/2 can be found in the supplementary. We apply our scheme to find cost-effective quantization configurations that minimize both FID and model size while providing some visual examples. We then compare our found quantization configurations to existing DM PTQ literature. Finally, we share some insights on the quantization sensitivity of denoiser architectures. |
| Researcher Affiliation | Collaboration | 1Department of Electrical and Computer Engineering, University of Alberta 2Huawei Technologies, Edmonton, Alberta, Canada 3Huawei Kirin Solution, Shanghai, China EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes methods and processes in narrative text and figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | Project https://kgmills.github.io/projects/qua2sedimo/. |
| Open Datasets | Yes | Specifically, for all T2I DMs, we use prompts and images from the COCO 2017 validation set to generate images and compute FID, respectively. For Di T-XL/2, we generate one image per Image Net class and measure FID against the Image Net validation set. We generate 10k images using MS-COCO (Lin et al. 2014) prompts and measure the Fr echet Inception Distance (FID) (Heusel et al. 2017) using the validation set. using COCO 2014 prompts. |
| Dataset Splits | Yes | Specifically, for all T2I DMs, we use prompts and images from the COCO 2017 validation set to generate images and compute FID, respectively. For Di T-XL/2, we generate one image per Image Net class and measure FID against the Image Net validation set. We split the corpus of quantization configurations into K 5 folds, each containing an 80%/20% training/validation data split with disjoint validation partitions. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It mentions generic terms like 'modern hardware' in the context of UAQ, but not for the experimental setup. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | To train Qua2Se Di Mo predictors, we sample and evaluate hundreds of randomly selected quantization configurations per denoiser architecture. To evaluate a configuration, we generate 1000 images and compute the FID score relative to a ground-truth image set. Specifically, for all T2I DMs, we use prompts and images from the COCO 2017 validation set to generate images and compute FID, respectively. For Di T-XL/2, we generate one image per Image Net class and measure FID against the Image Net validation set. We generate 10242 images using Pix Art-Σ and Hunyuan and set a resolution of 5122 for all other DMs. Specifically, λ re-scales Ę Bits to determine how we weigh model size against performance (FID). As such, λ is a denoiser-dependent coefficient. Further, we consider three ranking losses Lrank: the differentiable spearman ρ from Blondel et al. (2020) that maximizes SRCC, Lambda Rank which maximizes NDCG and a Hybrid loss that sums both of them to maximize SRCC and NDCG. We split the corpus of quantization configurations into K 5 folds, each containing an 80%/20% training/validation data split with disjoint validation partitions. |