Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective

Authors: Steve Azzolin, Sagar Malhotra, Andrea Passerini, Stefano Teso

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that even a simple instantiation of Dual-Channel GNNs can recover succinct rules and perform on par or better than widely used SE-GNNs. [...] Empirical results on three synthetic and five real-world graph classification datasets highlight that DC-GNNs perform as well or better than SE-GNNs by adaptively employing one channel or both depending on the task.
Researcher Affiliation Academia 1DISI, University of Trento, Trento, Italy 2TU Wien, Wien, Austria. Correspondence to: Steve Azzolin <EMAIL>, Sagar Malhotra <EMAIL>.
Pseudocode No The paper describes the methodologies narratively and mathematically, but does not include any explicitly labeled pseudocode or algorithm blocks. For example, Definition 6.1 (DC-GNN) provides a mathematical definition rather than a step-by-step algorithm.
Open Source Code Yes Full details about the empirical analysis are in Appendix B. Our code is publicly available on Git Hub5. 5https://github.com/steveazzolin/beyond-topo-segnns
Open Datasets Yes Synthetic datasets include GOODMotif (Gui et al., 2022), and two novel datasets. Red Blue Nodes contains random graphs where each node is either red or blue, and the task is to predict which color is more frequent. Similarly, Topo Feature contains random graphs where each node is either red or uncolored, and the task is to predict whether the graph contains at least two red nodes and a cycle, which is randomly attached to the base graph. [...] Real-world datasets include MUTAG (Debnath et al., 1991), BBBP (Morris et al., 2020), MNIST75sp (Knyazev et al., 2019), AIDS (Riesen & Bunke, 2008), and Graph-SST2 (Yuan et al., 2022).
Dataset Splits Yes We also generate two OOD splits, where respectively either the number of total nodes is increased to 250 (OOD1), or where the distribution of the base graph is switched to an Erdos-R enyi distribution (OOD2) (Erdos et al., 1959). For GOODMotif we will use the original OOD splits (Gui et al., 2022). [...] Every model is trained for the same 10 random splits, and the optimization protocol is fixed across all experiments following previous work (Miao et al., 2022a) and using the Adam optimizer (Kingma & Ba, 2015).
Hardware Specification Yes Experiments are run on two different Linux machines, with CUDA 12.6 and a single NVIDIA Ge Force RTX 4090, or with CUDA 12.0 and a single NVIDIA TITAN V.
Software Dependencies Yes Our implementation is done using Py Torch 2.4.1 (Paszke et al., 2017) and Py G 2.4.0 (Fey & Lenssen, 2019).
Experiment Setup Yes Model hyper-parameter. We set the weight of the explanation regularization as follows: For GISST, we weight all regularization by 0.01 in the final loss; For SMGNN, we set 1.0 and 0.8 the L1 and entropy regularization respectively; For GSAT, we set the value of r to 0.7 for GOODMotif, MNIST75sp, Graph-SST2, and BBBP, to 0.5 for Topo Feature, AIDS, AIDSC1, and MUTAG, and to 0.3 for Red Blue Nodes. Also, for GSAT we set the decay of r is set every 10 step for every dataset, except for Graph-SST2 and GOODMotif where it is set to 20. Then, the parameter λ regulating the weight of the regularization is set to 0.001 for all experiments with SMGNN, while to 1 for GSAT on every dataset except for Red Blue Nodes. [...] For each model, we set the hidden dimension of GNN layers to be 64 for MUTAG, 300 for GOODMotif, BBBP, and Graph-SST2, and 100 otherwise. Similarly, we use a dropout value of 0.5 for GOODMotif and Graph-SST2, of 0.3 for MNIST75sp, MUTAG, and BBBP, and of 0.0 otherwise. [...] Every model is trained for the same 10 random splits, and the optimization protocol is fixed across all experiments following previous work (Miao et al., 2022a) and using the Adam optimizer (Kingma & Ba, 2015). Also, for experiments with Dual-Channel GNN, we fix an initial warmup of 20 epochs where the two channels are trained independently to output the ground truth label. After this warmup, only the overall model is trained altogether. The total number of epochs is fixed to 100 for every dataset except for Graph-SST2 where it is set to 200.