Graph Neural Networks Can (Often) Count Substructures

Authors: Paolo Pellizzoni, Till Schulz, Karsten Borgwardt

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically validate that our sufficient conditions for GNNs to count subgraphs hold on many real-world datasets, providing a theoretically-grounded explanation to our motivating observations. [...] Table 1: Test set results for subgraph counting with a GNN on molecular graphs. Reported: AUROC for multi-class classification and normalized mean avg. error of the prediction (see Sect. D.1).
Researcher Affiliation Academia Paolo Pellizzoni, Till Hendrik Schulz, Karsten Borgwardt Max Planck Institute of Biochemistry, Martinsried, Germany EMAIL
Pseudocode Yes Algorithm 1: TREE-COLSIc(u, ℓ) [...] Algorithm 2: MERGE((C1, . . . , Cδ))
Open Source Code Yes Our code and data are available at github.com/Borgwardt Lab/ GNNs Can Count Substructures.
Open Datasets Yes Mutagenicity (Kersting et al., 2016) [...] MCF-7 (Kersting et al., 2016) [...] ZINC (G omez-Bombarelli et al., 2018) [...] ogbg-molhiv (Hu et al., 2021; Wu et al., 2018) [...] ogbg-molpcba (Hu et al., 2021; Wu et al., 2018) [...] Peptides-func (Dwivedi et al., 2022; Singh et al., 2015) [...] PCQM-Contact (Dwivedi et al., 2022)
Dataset Splits Yes The data is split into 80% for training and 20% for testing.
Hardware Specification No No specific hardware details (like GPU/CPU models or memory) were provided for the experimental setup.
Software Dependencies No The paper mentions using the Adam optimizer and a GNNK architecture but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We used the GNNK architecture as described in Section A.1, with K = 4 MLP-layer-based message passing layers. [...] The dimensionality of the GNN embeddings is fixed at 512. We used the Adam optimizer with a variable learning rate and a batch size of 128. The data is split into 80% for training and 20% for testing. Finally, we train for 300 epochs and report the mean as well as the standard deviations over at total of 5 such runs.