reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective

Authors: Steve Azzolin, Sagar Malhotra, Andrea Passerini, Stefano Teso

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that even a simple instantiation of Dual-Channel GNNs can recover succinct rules and perform on par or better than widely used SE-GNNs. [...] Empirical results on three synthetic and five real-world graph classification datasets highlight that DC-GNNs perform as well or better than SE-GNNs by adaptively employing one channel or both depending on the task.
Researcher Affiliation	Academia	1DISI, University of Trento, Trento, Italy 2TU Wien, Wien, Austria. Correspondence to: Steve Azzolin <EMAIL>, Sagar Malhotra <EMAIL>.
Pseudocode	No	The paper describes the methodologies narratively and mathematically, but does not include any explicitly labeled pseudocode or algorithm blocks. For example, Definition 6.1 (DC-GNN) provides a mathematical definition rather than a step-by-step algorithm.
Open Source Code	Yes	Full details about the empirical analysis are in Appendix B. Our code is publicly available on Git Hub5. 5https://github.com/steveazzolin/beyond-topo-segnns
Open Datasets	Yes	Synthetic datasets include GOODMotif (Gui et al., 2022), and two novel datasets. Red Blue Nodes contains random graphs where each node is either red or blue, and the task is to predict which color is more frequent. Similarly, Topo Feature contains random graphs where each node is either red or uncolored, and the task is to predict whether the graph contains at least two red nodes and a cycle, which is randomly attached to the base graph. [...] Real-world datasets include MUTAG (Debnath et al., 1991), BBBP (Morris et al., 2020), MNIST75sp (Knyazev et al., 2019), AIDS (Riesen & Bunke, 2008), and Graph-SST2 (Yuan et al., 2022).
Dataset Splits	Yes	We also generate two OOD splits, where respectively either the number of total nodes is increased to 250 (OOD1), or where the distribution of the base graph is switched to an Erdos-R enyi distribution (OOD2) (Erdos et al., 1959). For GOODMotif we will use the original OOD splits (Gui et al., 2022). [...] Every model is trained for the same 10 random splits, and the optimization protocol is fixed across all experiments following previous work (Miao et al., 2022a) and using the Adam optimizer (Kingma & Ba, 2015).
Hardware Specification	Yes	Experiments are run on two different Linux machines, with CUDA 12.6 and a single NVIDIA Ge Force RTX 4090, or with CUDA 12.0 and a single NVIDIA TITAN V.
Software Dependencies	Yes	Our implementation is done using Py Torch 2.4.1 (Paszke et al., 2017) and Py G 2.4.0 (Fey & Lenssen, 2019).
Experiment Setup	Yes	Model hyper-parameter. We set the weight of the explanation regularization as follows: For GISST, we weight all regularization by 0.01 in the final loss; For SMGNN, we set 1.0 and 0.8 the L1 and entropy regularization respectively; For GSAT, we set the value of r to 0.7 for GOODMotif, MNIST75sp, Graph-SST2, and BBBP, to 0.5 for Topo Feature, AIDS, AIDSC1, and MUTAG, and to 0.3 for Red Blue Nodes. Also, for GSAT we set the decay of r is set every 10 step for every dataset, except for Graph-SST2 and GOODMotif where it is set to 20. Then, the parameter λ regulating the weight of the regularization is set to 0.001 for all experiments with SMGNN, while to 1 for GSAT on every dataset except for Red Blue Nodes. [...] For each model, we set the hidden dimension of GNN layers to be 64 for MUTAG, 300 for GOODMotif, BBBP, and Graph-SST2, and 100 otherwise. Similarly, we use a dropout value of 0.5 for GOODMotif and Graph-SST2, of 0.3 for MNIST75sp, MUTAG, and BBBP, and of 0.0 otherwise. [...] Every model is trained for the same 10 random splits, and the optimization protocol is fixed across all experiments following previous work (Miao et al., 2022a) and using the Adam optimizer (Kingma & Ba, 2015). Also, for experiments with Dual-Channel GNN, we fix an initial warmup of 20 epochs where the two channels are trained independently to output the ground truth label. After this warmup, only the overall model is trained altogether. The total number of epochs is fixed to 100 for every dataset except for Graph-SST2 where it is set to 200.