TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models
Authors: Haocheng Huang, Jiaxin Chen, Jinyang Guo, Ruiyi Zhan, Yunhong Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various benchmarks and distinct diffusion models demonstrate that the proposed method substantially outperforms the state-of-the-art approaches in most cases, especially yielding comparable FID metrics to the full precision model on CIFAR-10 in the W6A6 setting, while enabling generating available images in the W4A4 settings. ... We conduct extensive experiments and ablation studies on various datasets and representative diffusion models, and demonstrate that our method remarkably outperforms the state-of-the-art PTQ approaches for diffusion models in most cases, especially under low bitwidths. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China 2School of Computer Science and Engineering, Beihang University, Beijing, China 3School of Artificial Intelligence, Beihang University, Beijing, China |
| Pseudocode | No | We refer to the Supplementary Material for the detailed algorithm of the proposed PAR method. |
| Open Source Code | No | The paper includes a link to an "Extended version https://dr-jiaxin-chen.github.io/page/" which is a personal webpage, not a direct link to the source code for the methodology described in this paper. There is no explicit statement about releasing the code or a direct link to a code repository within the provided text. |
| Open Datasets | Yes | we evaluate our proposed method on the Image Net dataset (Deng et al. 2009) by using LDM-4 for the conditional generation task. For the unconditional generation task, we conduct experiments on the CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009) by using DDIM (Song, Meng, and Ermon 2021), LSUN-Bedrooms and LSUNChurches dataset (Yu et al. 2015) based on LDM-4. |
| Dataset Splits | No | On CIFAR-10 with DDIM, we follow the same setting as Q-Diffusion (Li et al. 2023b). ... On LSUN-bedrooms with Latent Diffusion Model (LDM), we adopt the same settings as TFMQ-DM (Huang et al. 2024b)... On Image Net, we employ a denoising process with 20 iterations, following the same setting as TFMQ-DM. The paper refers to other works for experimental settings but does not explicitly provide the dataset split information within its own text. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX4090 GPU. |
| Software Dependencies | No | The paper mentions techniques like "BRECQ" and "Rep QVi T" but does not specify any software names with version numbers (e.g., Python, PyTorch, CUDA versions) used for implementation. |
| Experiment Setup | Yes | For the weight quantization, we conduct BRECQ with 20,000 iterations for initialization, and 10,000 iterations for each progressive round with a batch size of 16. For the activation quantization, we use the commonly used hyper-parameter search method as depicted in Rep QVi T (Li et al. 2023c) with a batch size of 64. All experiments are conducted with an 8-bit post-Softmax layer unless being specifically claimed. We set Rtru to 3 when performing TCR on W4A8 bit-width, while not implementing truncation operation on remaining experiments. |