Generating Graphs via Spectral Diffusion
Authors: GIORGIA MINELLO, Alessandro Bicciato, Luca Rossi, Andrea Torsello, Luca Cosmo
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An extensive set of experiments on both synthetic and real-world graphs demonstrates the strengths of our model against state-of-the-art alternatives. |
| Researcher Affiliation | Academia | Giorgia Minello Ca Foscari University EMAIL Alessandro Bicciato Ca Foscari University EMAIL Luca Rossi The Hong Kong Polytechnic University EMAIL Andrea Torsello Ca Foscari University EMAIL Luca Cosmo Ca Foscari University EMAIL |
| Pseudocode | No | The paper describes the proposed pipeline and neural network architectures with diagrams (Figure 1 and Figure 2) and textual descriptions in Section 4, but it does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | In order to guarantee the reproducibility of both our model architecture and results, we have made our code accessible on an online public repository 2 |
| Open Datasets | Yes | The synthetic datasets we consider are (i) Community-small (12 |V | 20), (ii) Planar (|V | = 64), and (iii) Stochastic Block Model (SBM) (2-5 communities and 20-40 nodes per community). The real-world datasets are both from the molecular domain, namely (i) Proteins (100-500 nodes) Dobson & Doig (2003) and (ii) QM9 (9 nodes) Ruddigkeit et al. (2012); Ramakrishnan et al. (2014). |
| Dataset Splits | Yes | for the synthetic datasets, we decided to create a larger set of test graphs: 200 graphs for Planar and SBM, and 100 graphs for community-small. Accordingly, we let each model generate an equivalent number of graphs (200 for Planar and SBM, 100 for community-small) to compute the MMD measures. Due to the limited number of graphs in the Proteins dataset (see Appendix A), we also followed a different and more robust protocol to evaluate the generated graphs on this dataset. Rather than utilizing a single subset of the dataset as a test set, we created 10 folds (identical for each method) allowing us to report the average of each metric ( standard deviation) over the 10 folds. ... For the training of the diffusion model, we split each dataset into 90% train and 10% test... For QM9, we allocate 10k molecules for validation, 10k for testing, and the rest for training. |
| Hardware Specification | Yes | These experiments were conducted on a computer equipped with an AMD Ryzen 7 3700X processor, 64GB of RAM, and an NVIDIA RTX 3070 8GB graphics card. |
| Software Dependencies | No | The paper mentions using RDKit for validity checking for molecule graphs (QM9) but does not provide a specific version number. It also implicitly relies on deep learning frameworks but does not list any with version numbers. |
| Experiment Setup | Yes | For the training of the diffusion model, we split each dataset into 90% train and 10% test, and we train the Spectral Diffusion on the whole dataset for 100k epochs, using early stopping on the reconstruction loss. We performed a grid search on the number of layers between 6, 9 and 12, and selected the best model according to the degree metric computed from the graphs reconstructed directly from the eigenvectors/values and the graphs of the training set. The sampling has been done using DDIM with 200 steps. Moreover, we generate each sample 4 times and keep the one with the lower deviation from orthogonality. For the training of the Graph Predictor, we used the same splits of the Spectral Diffusion, and trained for 100k epochs. We performed early stopping by comparing the degree distribution of the generated graphs with the training graphs. We used 6 PPGN layers and 3 PPGN layers for the Graph Predictor and the discriminator network respectively, except for QM9 in which also the Graph Predictor is composed of three layers. For QM9, we let the Graph Predictor to generate also edge features, similarly to Martinkus et al. (2022). For all datasets, following the observations in Appendix E, we train both Spectral Diffusion and Predictor on the 16 smallest and 32 largest eigenpairs and select the final model according to the best average metrics on the validation set. |