Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Global Graph Counterfactual Explanation: A Subgraph Mapping Approach

Authors: Yinhan He, Wendy Zheng, Yaochen Zhu, Jing Ma, Saumitra Mishra, Natraj Raman, Ninghao Liu, Jundong Li

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of our Global GCE compared to existing baselines. Our code can be found at https://github.com/Yinhan He123/Global GCE. [...] In this section, we evaluate Global GCE with extensive experiments on five real-world datasets. Our experiments aim to answer the following research questions: Quantative Analysis: How does Global GCE perform w.r.t. the evaluation metrics compared with the state-of-the-art baselines; How do different components in Global GCE contribute to the performance? Qualitative Analysis: How does Global GCE
Researcher Affiliation Collaboration Yinhan He EMAIL Department of Electrical and Computer Engineering University of Virginia [...] Saumitra Mishra EMAIL JP Morgan Chase & Co.
Pseudocode No The paper describes the methodology using mathematical formulations and descriptive text, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Extensive experiments demonstrate the superiority of our Global GCE compared to existing baselines. Our code can be found at https://github.com/Yinhan He123/Global GCE.
Open Datasets Yes Our experiments utilize five real-world datasets (NCI1, Mutagenicity, AIDS, ENZYMES, PROTEINS) from TUDataset (Morris et al., 2020), where graphs represent chemical compounds with nodes as atoms and edges as bonds.
Dataset Splits Yes The train/validate/test split is 50%/25%/25%. GNN accuracies are in Appendix D.1.1.
Hardware Specification No The paper lists software dependencies but does not specify any particular hardware (CPU/GPU models, memory, etc.) used for the experiments. It only states: "All the experiments are conducted in the following environment:"
Software Dependencies Yes All the experiments are conducted in the following environment: Python==3.9 Pytorch==1.11.0 torch-geometric==2.1.0 torch-scatter==2.0.9 torch-sparse==0.6.15 Scipy==1.9.3 Networkx==3.0 Numpy==1.23.4 g Span-mining==0.2.3
Experiment Setup Yes The model is trained with a SGD with a learning rate of 1e-3 for 500 epochs. The train/validate/test split is 50%/25%/25%. [...] In the frequent subgraph generation, we set the minimum and maximum number of nodes to three and twenty respectively. The minimum appearance rate τ for different datasets of the frequent subgraphs is shown in the Appendix. For the counterfactual subgraph autoencoder, we allow at most two CSMs to be applied on the same input graphs to avoid combinatorial complexity (if m CSMs are applicable to one input graph, m k combinations need to be evaluated). We set the latent space dimension as 64 and the dropout rate for the autoencoder to be 0.5. Please refer to more hyperparameters in Appendix D.1.2. [...] We set the α = 10, β = γ = 1 to emphasize the counterfactual s structural change.