Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models
Authors: Zheng Gong, Ying Sun
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Bit-DGDM not only reducing the memory usage from the FP32 baseline by up to 2.8 and achieve up to 2.5 speedup, but also achieve comparable performance to ultra-low precision of up to 4-bit. ... Evaluation: We conduct extensively experiments across multiple DGDMs and diverse datasets to demonstrate that Bit-DGDM can effectively quantize DGDMs and obtain the superior performance compared with other SOTA quantization baselines. |
| Researcher Affiliation | Academia | 1Artificial Intelligence Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China. Correspondence to: Ying Sun <EMAIL>. |
| Pseudocode | Yes | The overall architecture and algorithm of our framework are shown in Fig. 3 and Alg. 1. ... Algorithm 1 The inference process of quantized multiplication XW through Bit-DGDM ... Algorithm 2 Ill-conditioned low-rank decomposition of weight W ... Algorithm 3 Adaptive Sparse-Dense Kernel on GPUs |
| Open Source Code | No | Our code is publicly available here. |
| Open Datasets | Yes | For 2D structured graphs, where node relationships are represented using adjacency matrices, we employ the widely used discrete DGDM, Digress (Vignac et al., 2023), which utilize Fi LM (Perez et al., 2018) and Graph Transformer (Dwivedi & Bresson, 2020) as backbones. These models are applied to molecular synthesis datasets, including QM9 (Wu et al., 2018) and MOSES (Polykovskiy et al., 2020), as well as non-molecular benchmarks (Martinkus et al., 2022) such as SBM and planar graphs. Moreover, we investigate protein inverse folding task, an essential application of graph generation, where node relationships are provided by 3D coordinates and assess the model s ability to recover the correct amino acid sequence given protein s 3D structure. We utilize the SOTA GRADE-IF (Yi et al., 2023) as the DGDM, based on Equivariant Graph Neural Network (Satorras et al., 2021). |
| Dataset Splits | No | Applied to molecular synthesis datasets, including QM9 (Wu et al., 2018) and MOSES (Polykovskiy et al., 2020), as well as non-molecular benchmarks (Martinkus et al., 2022) such as SBM and planar graphs. ... The datasets used for evaluation include graphs from the Stochastic Block Model (SBM), where the training graphs are sampled from the stochastic block model (with up to 200 nodes per graph), and planar graphs, where each graph consists of 64 nodes. |
| Hardware Specification | Yes | We use the Torch CUDA profiler to measure the latency and peak memory usage for generating graphs with a batch size of 16 on a single NVIDIA RTX3090 GPU, complemented by two 2.20GHz Intel Xeon Gold 5220R CPU, and 512GB of CPU memory. |
| Software Dependencies | No | We use the Torch CUDA profiler to measure the latency and peak memory usage for generating graphs with a batch size of 16 on a single NVIDIA RTX3090 GPU, complemented by two 2.20GHz Intel Xeon Gold 5220R CPU, and 512GB of CPU memory. |
| Experiment Setup | Yes | For Bit-DGDM, we first determine the thresholds, τmax and τmin, for identifying activation outliers by selecting the top 0.1% highest and lowest values, respectively. To obtain the activation values, we randomly generate 32 samples using the full-precision DGDMs. For low-rank weight decomposition, we set the rank r to 32, the α-sparsity parameter α to 1%, and the step size η to 0.1. ... We use the Torch CUDA profiler to measure the latency and peak memory usage for generating graphs with a batch size of 16... |