reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explainable Graph Neural Networks via Structural Externalities

Authors: Lijun Wu, Dong Hao, Zhiyi Fan

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental studies on both synthetic and real-world datasets show that Graph EXT outperforms existing baseline methods in terms of fidelity across diverse GNN architectures , significantly enhancing the explainability of GNN models.
Researcher Affiliation	Academia	Lijun Wu1 , Dong Hao1,2 and Zhiyi Fan1 1SCSE, University of Electronic Science and Technology of China 2AI-HSS, University of Electronic Science and Technology of China EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Compute the Network Based Value Function Algorithm 2 Shapley Value Sampling and Estimation
Open Source Code	Yes	The full version of this paper and source code are at this link.
Open Datasets	Yes	We evaluated the effectiveness of Graph EXT using six datasets from node classification and graph-level classification tasks. The statistical details of these datasets are shown in Table 1. These datasets include synthetic datasets, sentiment graph datasets, and biological datasets. BA-Shapes [Ying et al., 2019] is designed for node classification tasks, it is based on a Barab asi-Albert graph with added house-like patterns. ... BA-2Motifs [Luo et al., 2020] is used for graph classification tasks... Graph-SST2 and Graph-Twitter [Yuan et al., 2022] are used for graph classification tasks... BBBP and Clin Tox [Wu et al., 2018] are designed for graph classification tasks...
Dataset Splits	No	For each dataset, we conducted quantitative calculations of the Fidelity metrics using test samples on trained GCN and GIN models to demonstrate the effectiveness of our method. The paper mentions using "test samples" but does not provide specific details on the train/validation/test splits, such as percentages, absolute counts, or references to predefined splits for reproducibility.
Hardware Specification	No	No specific hardware details (such as GPU/CPU models, processor types, or memory) used for running the experiments are provided in the paper.
Software Dependencies	No	In our experiments, we trained on all datasets using threelayer GCN [Kipf and Welling, 2016] and GIN [Xu et al., 2018] models. ... The datasets and baseline implementations were based on the DIG Library [Liu et al., 2021]. The paper mentions GCN and GIN models, and the DIG Library, but does not provide specific version numbers for these software components.
Experiment Setup	No	In our experiments, we trained on all datasets using threelayer GCN [Kipf and Welling, 2016] and GIN [Xu et al., 2018] models. The model with the highest accuracy on the test set was selected as our final model, and all models were trained to achieve competitive accuracy. The paper mentions the use of three-layer GCN and GIN models and the model selection criterion, but it does not provide specific hyperparameters or detailed training configurations (e.g., learning rate, batch size, optimizer settings) needed for reproducibility.