reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Redundancy Undermines the Trustworthiness of Self-Interpretable GNNs

Authors: Wenxin Tai, Ting Zhong, Goce Trajcevski, Fan Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our findings through extensive experiments across diverse datasets, model architectures, and self-interpretable GNN frameworks, providing a benchmark to guide future research on addressing redundancy and advancing GNN deployment in critical domains.
Researcher Affiliation	Academia	1Department of Software Engineering, University of Electronic Science and Technology of China, China 2Department of Electrical and Computer Engineering, Iowa State University, United States. Correspondence to: Fan Zhou <EMAIL>.
Pseudocode	No	The paper describes methods and processes using mathematical formulas and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/ICDM-UESTC/Trustworthy Explanation.
Open Datasets	Yes	We select four datasets: a synthetic dataset, BA-2MOTIFS (Luo et al., 2020), and three real-world dataset 3MR (Rao et al., 2022), BENZENE (Morris et al., 2020), and MUTAGENICITY (Morris et al., 2020) all sourced from the graph learning community and have ground-truth explanation labels. All datasets are published and can be downloaded from the Internet (see Table 5).
Dataset Splits	No	The paper mentions using a 'validation set' for hyperparameter selection and running methods multiple times with 'random seeds', but it does not specify explicit percentages or counts for training, validation, and test splits required for reproducing the exact data partitioning.
Hardware Specification	Yes	All experiments were conducted using Py Torch, trained with the Adam optimizer (Kingma & Ba, 2015), and executed on one NVIDIA RTX 4090 GPU with Intel Core i7-13700KF CPU.
Software Dependencies	No	The paper mentions 'Py Torch' as the framework used and the 'Adam optimizer', but it does not provide specific version numbers for these or other key software components.
Experiment Setup	Yes	GIN consists of 2 layers with a hidden size of 64, while GCN has 3 layers with the same hidden size. We employ a 3-layer Multi-Layer Perceptron (MLP) to predict edge weights, with hidden sizes set to 256, 64, 1. The learning rate is chosen from {0.01, 0.005, 0.001, 0.0005, 0.0001}. The coefficient for EA is selected from {0.01, 0.1, 1, 10, 100}. We start using SWA from the 10-th epoch. The hyperparameters (e.g., β, γ) are selected from {0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100}. The Gumbel Softmax technique is used with edge weight calculated by ϵ Uniform(0, 1), eij = σ((log ϵ log(1 ϵ) + wij)/τ).