reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Advancing Graph Generation through Beta Diffusion

Authors: Xinyang Liu, Yilin He, Bo Chen, Mingyuan Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, our experiments generating data on various synthetic and real-world graphs confirm the effectiveness of beta diffusion as a strategic choice within the design framework of the backbone diffusion model, especially for graph generation tasks. We compare GBD against various graph generation methods, including autoregressive models... We evaluate using maximum mean discrepancy (MMD) to compare the distributions of key graph properties between test graphs and generated graphs: degree (Deg.), clustering coefficient (Clus.), and 4-node orbit counts (Orbit). Additionally, we report the eigenvalues of the graph Laplacian (Spec.) and the percentage of valid, unique, and novel graphs (V.U.N.) to assess how well the model captures both intrinsic features and global graph properties. Ablation study on precondition and computation domain. Ablation study on concentration modulation.
Researcher Affiliation	Academia	Xinyang Liu1, Yilin He1, Bo Chen2, Mingyuan Zhou1 1The University of Texas at Austin, 2Xidian University EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	Yes	We provide the pseudo-code of the training and sampling of our generative framework within original domain and logit domain, respectively. Specifically, Algorithm 3 and Algorithm 1 show the procedure of training and sampling in original domain, respectively. In practice, we migrate our proposed GBD to logit domain shown in Algorithm 4 and Algorithm 2 in most cases.
Open Source Code	Yes	Our Py Torch code is available at https://github.com/xinyang ATK/Graph Beta Diffusion.
Open Datasets	Yes	We use five graph datasets with varying sizes, connectivity, and topology, commonly employed as benchmarks. Ego-small includes 200 sub-graphs from the Citeseer network... Community-small has 100 synthetic graphs... Grid contains 100 2D grid graphs... Planar includes 200 synthetic planar graphs... SBM comprises 200 stochastic block model graphs... We consider two widely-used molecule datasets as benchmarks in Jo et al. (2022): QM9 (Ramakrishnan et al., 2014), which consists of 133,885 molecules... and ZINC250k (Irwin et al., 2012), which consists of 249,455 molecules...
Dataset Splits	Yes	For a fair comparison, we follow the experimental setup of Jo et al. (2022; 2023), using the same train/test split. We evaluate using maximum mean discrepancy (MMD) to compare the distributions of key graph properties between test graphs and generated graphs... For a fair comparison, we follow the experimental and evaluation settings of Jo et al. (2022; 2023), using the same train/test split, where 80% of the data is used as the training set and the remaining 20% as the test set.
Hardware Specification	Yes	For all experiments, we utilized the Py Torch (Paszke et al., 2019) framework to implement GBD and trained the model with NVIDIA Ge Force RTX 4090 and RTX A5000 GPUs.
Software Dependencies	No	For all experiments, we utilized the Py Torch (Paszke et al., 2019) framework to implement GBD... We kekulize the molecules using the RDKit library (Landrum et al., 2006)... In practice, we utilize the library of Network X Hagberg et al. (2008) to implement this. (No specific versions for PyTorch, RDKit, or NetworkX are provided.)
Experiment Setup	Yes	We set it to 0.01, following Zhou et al. (2023), and found that this configuration is sufficient to produce graphs that closely resemble the reference graphs without further tuning... We set the diffusion steps to 1000 for all the diffusion models. For important hyperparameters mentioned in Sec 2.3, we usually set Scale = 0.9, Shift = 0.09. and η = [10000, 100, 30, 10] for the normalized degrees corresponding to the intervals falling in the interval split by [1.0, 0.8, 0.4, 0.1], respectively. In practice, we set threshold as 0.5 to quantize generated continue adjacency matrix. For both QM9 and ZINC250k, we encode nodes and edges to one-hot and set Scale = 0.9, Shift = 0.09. For η modulated in molecule generation, with the help of chemical knowledge, we apply η = [10000, 100, 100, 100, 30] to carbon-carbon bonds, carbonnitrogen, carbon-oxygen, carbon-fluorine, and other possible bonds, respectively. For η of nodes, we apply η = [10000, 100, 100, 30] on carbon atom, nitrogen atom, oxygen atom and other possible atoms, respectively.