Cometh: A continuous-time discrete-state graph diffusion model
Authors: Antoine Siraudin, Fragkiskos D. Malliaros, Christopher Morris
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that integrating continuous time leads to significant improvements across various metrics over state-of-the-art discrete-state diffusion models on a large set of molecular and non-molecular benchmark datasets. In terms of valid, unique, and novel (VUN) samples, Cometh obtains a near-perfect performance of 99.5% on the planar graph dataset and outperforms Di Gress by 12.6% on the large Guaca Mol dataset. |
| Researcher Affiliation | Academia | Antoine Siraudin EMAIL RWTH Aachen Fragkiskos Malliaros Centrale Supélec, Inria Université Paris-Saclay Christopher Morris RWTH Aachen |
| Pseudocode | Yes | Algorithm 1: Training Input: A graph G = (X, E) Sample t U([0, 1]) Sample Gt X Qt X E Qt E Sample sparse noisy graph Predict pθ 0|t(G | G(t)) Predict clean graph using neural network LCE Pn i log pθ 0|t(x(0) i | G(t)) λ Pn i<j log pθ 0|t(e(0) ij | G(t)) Update θ using LCE Algorithm 2: τ-leaping sampling of Cometh Sample n from the training data distribution Sample G(T ) Q ij m E Sample random graph from prior distribution while t > 0.01 do for i = 1 to n do for x in X do ˆRt,θ X (x(t) i , x) = Rt X( xi, x(t) i ) P x0 qt|0( xi|x(0) i ) qt|0(x(t) i |x(0) i )pθ 0|t(x(0) i | G(t)), for x(t) i = xi Sample jx(t) i , x P(τ ˆRt,θ X (x(t) i , x)) Count transitions on node i |
| Open Source Code | Yes | Our code is available at github.com/asiraudin/Cometh. |
| Open Datasets | Yes | To assess the ability of our method to model attributed graph distributions, we evaluate its performance on the standard dataset QM9 (Wu et al. (2018)). We further evaluate Cometh on two large molecule generation benchmarks, MOSES (Polykovskiy et al., 2020)) and Guaca Mol (Brown et al., 2019)). |
| Dataset Splits | Yes | We use the same split as Vignac et al. (2022), with 100k molecules for training, 10k for testing, and the remaining data for the validation set. We evaluate our method on two datasets from the SPECTRE benchmark (Martinkus et al., 2022), with 200 graphs each. Planar contains planar graphs of 64 nodes, and SBM contains graphs drawn from a stochastic block model with up to 187 nodes. We use the same split as the original paper, which uses 128 graphs for training, 40 for training, and the rest as a validation set. |
| Hardware Specification | Yes | Experiments on QM9, Planar, and SBM were carried out using a single V100 or A10 GPU at the training and sampling stage. The training time on QM9 is 6 hours, while the training time on SBM and Planar is approximately 2 days and a half. We trained models on MOSES or Guacamol using two A100 GPUs. To sample from these models, we used a single A100 GPU. |
| Software Dependencies | No | The paper mentions RDKit library for molecule kekulization and Psi4 library for property estimation, but specific version numbers for these are not provided. |
| Experiment Setup | Yes | On Planar, we report results using τ = 0.002, i.e. using 500 τ-leaping. We also evaluate our model using 10 corrector steps after each predictor step when t < 0.1T, with τ = 0.002, for 1000 τ-leaping steps. We found our best results using τc = 0.7. On SBM, we report results using τ = 0.001, i.e., using 1 000 τ-leaping steps. We trained models sweeping over puncond {0.1, 0.2}, and explore different values for s in J1, 6K during sampling. We obtained our best results using puncond = 0.1 and s = 1. |