DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
Authors: Hyogon Ryu, NaHyeon Park, Hyunjung Shim
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested our method on various datasets, including MS-COCO (Lin et al., 2014) and Parti Prompts (Yu et al., 2022), and confirmed its superior performance in generating high-quality and text-aligned images. Our method achieved a reduction of 1.29 in FID score compared to full precision and an almost identical CLIP score (a decrease of only 0.001) on MS-COCO dataset, while saving 93.7% in bit operations (from 694 TBOPs to 43.4 TBOPs). |
| Researcher Affiliation | Academia | Hyogon Ryu Na Hyeon Park Hyunjung Shim Korea Advanced Institute of Science and Technology (KAIST) EMAIL |
| Pseudocode | No | The paper describes the methodology in detail in Section 3.3 "DISTRIBUTION-AWARE GROUP QUANTIZATION" but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ugonfor/DGQ. |
| Open Datasets | Yes | We tested our method on various datasets, including MS-COCO (Lin et al., 2014) and Parti Prompts (Yu et al., 2022), and confirmed its superior performance in generating high-quality and text-aligned images. |
| Dataset Splits | Yes | The dataset used for calibration during quantization was generated using 64 captions from the MS-COCO Dataset (Lin et al., 2014). Similar to the approach taken in Tang et al. (2023), we evaluated prompt generalization performance using the Parti Prompts (Yu et al., 2022) dataset, which differs from the calibration dataset. For the text-to-image model, we used Stable Diffusion v1.4. We measured FID (Heusel et al., 2017) and IS (Salimans et al., 2016) scores to evaluate image quality, and the CLIP score to evaluate text-image alignment. For main results (Table 2), we compute the FID and IS using 30K samples. For the ablation study (Table 3), we use 10K samples. |
| Hardware Specification | Yes | On the other hand, DGQ used only 64 sample prompts during the activation quantization process and was completed in about 20 minutes on just one RTX A6000 (based on Stable Diffusion v1.4 with 25 steps). |
| Software Dependencies | No | The paper mentions employing the "diffusers 3" library, but does not specify its exact version or other key software components with their respective version numbers (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Unless specified otherwise, we apply 25 inference steps for computational efficiency. ... For Outlier-preserving Group Quantization, a group size of 8 was used. ... For Attention-aware Quantization, we applied a Log Quantizer, separating the <start> token, and utilized dynamic quantization. |