reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Evaluating explainability techniques on discrete-time graph neural networks

Authors: Manuel Dileo, Matteo Zignani, Sabrina Tiziana Gaito

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we outline the best explainability techniques for discrete-time GNNs in terms of fidelity, efficiency, and human-readability trade-offs.
Researcher Affiliation	Academia	Manuel Dileo EMAIL Department of Computer Science University of Milan, Milan, Italy Matteo Zignani EMAIL Department of Computer Science University of Milan, Milan, Italy Sabrina Gaito EMAIL Department of Computer Science University of Milan, Milan, Italy
Pseudocode	Yes	Algorithm 1 Evaluating discrete-time GNNs explanations
Open Source Code	Yes	Code and Supplementary Material are available in a Github repository1. 1https://github.com/manuel-dileo/dtgnn-explainer
Open Datasets	Yes	Datasets. We evaluate the explainability models on three well-known real-world temporal graph datasets, Bitcoin OTC (Pareja et al., 2020), Reddit-title (You et al., 2022), and Email-EU (Paranjape et al., 2017), covering the three of the most important applicative domains of discrete-time networks, which are financial, social, and collaboration networks, plus a recent temporal network dataset coming from protein interactions (Fu & He, 2022).
Dataset Splits	Yes	Configuration. We split each dataset chronologically in train, validation, and test sets using 70/10/20 of the snapshots.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU models, memory specifications) for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for the tools and libraries used in the experiments.
Experiment Setup	Yes	Configuration. We split each dataset chronologically in train, validation, and test sets using 70/10/20 of the snapshots. We randomly sample 50 target events to explain each snapshot. Note that this choice led to test explainability methods on a number of events that is up to five times the number of instances in previous works (Xia et al., 2023; Chen & Ying, 2023), which were 500 overall. For each target, the candidate events are all the edges that appear in a time window immediately before the event, whose size is set equal to the 10% of the snapshots in the dataset. Following previous works (Amara et al., 2022), to compare different explainability techniques, we set the sparsity level of the explanations to a maximum size, equal to 20 events. All the explainers are trained for 200 epochs.