Advancing Graph Generation through Beta Diffusion
Authors: Xinyang Liu, Yilin He, Bo Chen, Mingyuan Zhou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, our experiments generating data on various synthetic and real-world graphs confirm the effectiveness of beta diffusion as a strategic choice within the design framework of the backbone diffusion model, especially for graph generation tasks. We compare GBD against various graph generation methods, including autoregressive models... We evaluate using maximum mean discrepancy (MMD) to compare the distributions of key graph properties between test graphs and generated graphs: degree (Deg.), clustering coefficient (Clus.), and 4-node orbit counts (Orbit). Additionally, we report the eigenvalues of the graph Laplacian (Spec.) and the percentage of valid, unique, and novel graphs (V.U.N.) to assess how well the model captures both intrinsic features and global graph properties. Ablation study on precondition and computation domain. Ablation study on concentration modulation. |
| Researcher Affiliation | Academia | Xinyang Liu1, Yilin He1, Bo Chen2, Mingyuan Zhou1 1The University of Texas at Austin, 2Xidian University EMAIL, EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | We provide the pseudo-code of the training and sampling of our generative framework within original domain and logit domain, respectively. Specifically, Algorithm 3 and Algorithm 1 show the procedure of training and sampling in original domain, respectively. In practice, we migrate our proposed GBD to logit domain shown in Algorithm 4 and Algorithm 2 in most cases. |
| Open Source Code | Yes | Our Py Torch code is available at https://github.com/xinyang ATK/Graph Beta Diffusion. |
| Open Datasets | Yes | We use five graph datasets with varying sizes, connectivity, and topology, commonly employed as benchmarks. Ego-small includes 200 sub-graphs from the Citeseer network... Community-small has 100 synthetic graphs... Grid contains 100 2D grid graphs... Planar includes 200 synthetic planar graphs... SBM comprises 200 stochastic block model graphs... We consider two widely-used molecule datasets as benchmarks in Jo et al. (2022): QM9 (Ramakrishnan et al., 2014), which consists of 133,885 molecules... and ZINC250k (Irwin et al., 2012), which consists of 249,455 molecules... |
| Dataset Splits | Yes | For a fair comparison, we follow the experimental setup of Jo et al. (2022; 2023), using the same train/test split. We evaluate using maximum mean discrepancy (MMD) to compare the distributions of key graph properties between test graphs and generated graphs... For a fair comparison, we follow the experimental and evaluation settings of Jo et al. (2022; 2023), using the same train/test split, where 80% of the data is used as the training set and the remaining 20% as the test set. |
| Hardware Specification | Yes | For all experiments, we utilized the Py Torch (Paszke et al., 2019) framework to implement GBD and trained the model with NVIDIA Ge Force RTX 4090 and RTX A5000 GPUs. |
| Software Dependencies | No | For all experiments, we utilized the Py Torch (Paszke et al., 2019) framework to implement GBD... We kekulize the molecules using the RDKit library (Landrum et al., 2006)... In practice, we utilize the library of Network X Hagberg et al. (2008) to implement this. (No specific versions for PyTorch, RDKit, or NetworkX are provided.) |
| Experiment Setup | Yes | We set it to 0.01, following Zhou et al. (2023), and found that this configuration is sufficient to produce graphs that closely resemble the reference graphs without further tuning... We set the diffusion steps to 1000 for all the diffusion models. For important hyperparameters mentioned in Sec 2.3, we usually set Scale = 0.9, Shift = 0.09. and η = [10000, 100, 30, 10] for the normalized degrees corresponding to the intervals falling in the interval split by [1.0, 0.8, 0.4, 0.1], respectively. In practice, we set threshold as 0.5 to quantize generated continue adjacency matrix. For both QM9 and ZINC250k, we encode nodes and edges to one-hot and set Scale = 0.9, Shift = 0.09. For η modulated in molecule generation, with the help of chemical knowledge, we apply η = [10000, 100, 100, 100, 30] to carbon-carbon bonds, carbonnitrogen, carbon-oxygen, carbon-fluorine, and other possible bonds, respectively. For η of nodes, we apply η = [10000, 100, 100, 30] on carbon atom, nitrogen atom, oxygen atom and other possible atoms, respectively. |