Graph Neural Networks Can (Often) Count Substructures
Authors: Paolo Pellizzoni, Till Schulz, Karsten Borgwardt
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically validate that our sufficient conditions for GNNs to count subgraphs hold on many real-world datasets, providing a theoretically-grounded explanation to our motivating observations. [...] Table 1: Test set results for subgraph counting with a GNN on molecular graphs. Reported: AUROC for multi-class classification and normalized mean avg. error of the prediction (see Sect. D.1). |
| Researcher Affiliation | Academia | Paolo Pellizzoni, Till Hendrik Schulz, Karsten Borgwardt Max Planck Institute of Biochemistry, Martinsried, Germany EMAIL |
| Pseudocode | Yes | Algorithm 1: TREE-COLSIc(u, ℓ) [...] Algorithm 2: MERGE((C1, . . . , Cδ)) |
| Open Source Code | Yes | Our code and data are available at github.com/Borgwardt Lab/ GNNs Can Count Substructures. |
| Open Datasets | Yes | Mutagenicity (Kersting et al., 2016) [...] MCF-7 (Kersting et al., 2016) [...] ZINC (G omez-Bombarelli et al., 2018) [...] ogbg-molhiv (Hu et al., 2021; Wu et al., 2018) [...] ogbg-molpcba (Hu et al., 2021; Wu et al., 2018) [...] Peptides-func (Dwivedi et al., 2022; Singh et al., 2015) [...] PCQM-Contact (Dwivedi et al., 2022) |
| Dataset Splits | Yes | The data is split into 80% for training and 20% for testing. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) were provided for the experimental setup. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and a GNNK architecture but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We used the GNNK architecture as described in Section A.1, with K = 4 MLP-layer-based message passing layers. [...] The dimensionality of the GNN embeddings is fixed at 512. We used the Adam optimizer with a variable learning rate and a batch size of 128. The data is split into 80% for training and 20% for testing. Finally, we train for 300 epochs and report the mean as well as the standard deviations over at total of 5 such runs. |