DeFoG: Discrete Flow Matching for Graph Generation

Authors: Yiming Qin, Manuel Madeira, Dorina Thanou, Pascal Frossard

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate state-of-the-art performance across synthetic, molecular, and digital pathology datasets, covering both unconditional and conditional generation settings.
Researcher Affiliation Academia 1EPFL, Lausanne, Switzerland. Correspondence to: Yiming Qin <EMAIL>, Manuel Madeira <EMAIL>.
Pseudocode Yes Algorithm 1 De Fo G Training Algorithm 2 De Fo G Sampling
Open Source Code Yes 1Code at github.com/manuelmlmadeira/De Fo G.
Open Datasets Yes We evaluate De Fo G using the widely adopted Planar, SBM (Martinkus et al., 2022), and Tree datasets (Bergmeister et al., 2023), along with the associated evaluation methodology. For the QM9 dataset, we follow the dataset split and evaluation metrics from Vignac et al. (2022), presenting the results in Appendix F.2.2, Tab. 8. For the larger MOSES and Guacamol datasets, we adhere to the training setup and evaluation metrics established by Polykovskiy et al. (2020) and Brown et al. (2019), respectively, with results in Tabs. 9 and 10. Lastly, we also include the ZINC250k dataset (Sterling & Irwin, 2015), which contains 249,455 molecules with up to 38 heavy atoms from 9 element types.
Dataset Splits Yes For the QM9 dataset, we follow the dataset split and evaluation metrics from Vignac et al. (2022). For the larger MOSES and Guacamol datasets, we adhere to the training setup and evaluation metrics established by Polykovskiy et al. (2020) and Brown et al. (2019), respectively, with results in Tabs. 9 and 10.
Hardware Specification Yes All the experiments in this work were run on a single NVIDIA A100-SXM4-80GB GPU.
Software Dependencies No The paper mentions using graph transformers and Relative Random Walk Probabilities (RRWP) but does not specify software versions for these tools or for the programming language and core libraries (e.g., Python, PyTorch versions).
Experiment Setup Yes In Tab. 5, we specifically highlight their values for the proposed training and sampling strategies (Sec. 3 and Appendix C.1), and conditional guidance parameter (see Appendix E). Table 5: Training and sampling parameters for full-step sampling (500 or 1000 steps for molecular and synthetic datasets, respectively).