Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Global Graph Counterfactual Explanation: A Subgraph Mapping Approach
Authors: Yinhan He, Wendy Zheng, Yaochen Zhu, Jing Ma, Saumitra Mishra, Natraj Raman, Ninghao Liu, Jundong Li
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of our Global GCE compared to existing baselines. Our code can be found at https://github.com/Yinhan He123/Global GCE. [...] In this section, we evaluate Global GCE with extensive experiments on five real-world datasets. Our experiments aim to answer the following research questions: Quantative Analysis: How does Global GCE perform w.r.t. the evaluation metrics compared with the state-of-the-art baselines; How do different components in Global GCE contribute to the performance? Qualitative Analysis: How does Global GCE |
| Researcher Affiliation | Collaboration | Yinhan He EMAIL Department of Electrical and Computer Engineering University of Virginia [...] Saumitra Mishra EMAIL JP Morgan Chase & Co. |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Extensive experiments demonstrate the superiority of our Global GCE compared to existing baselines. Our code can be found at https://github.com/Yinhan He123/Global GCE. |
| Open Datasets | Yes | Our experiments utilize five real-world datasets (NCI1, Mutagenicity, AIDS, ENZYMES, PROTEINS) from TUDataset (Morris et al., 2020), where graphs represent chemical compounds with nodes as atoms and edges as bonds. |
| Dataset Splits | Yes | The train/validate/test split is 50%/25%/25%. GNN accuracies are in Appendix D.1.1. |
| Hardware Specification | No | The paper lists software dependencies but does not specify any particular hardware (CPU/GPU models, memory, etc.) used for the experiments. It only states: "All the experiments are conducted in the following environment:" |
| Software Dependencies | Yes | All the experiments are conducted in the following environment: Python==3.9 Pytorch==1.11.0 torch-geometric==2.1.0 torch-scatter==2.0.9 torch-sparse==0.6.15 Scipy==1.9.3 Networkx==3.0 Numpy==1.23.4 g Span-mining==0.2.3 |
| Experiment Setup | Yes | The model is trained with a SGD with a learning rate of 1e-3 for 500 epochs. The train/validate/test split is 50%/25%/25%. [...] In the frequent subgraph generation, we set the minimum and maximum number of nodes to three and twenty respectively. The minimum appearance rate τ for different datasets of the frequent subgraphs is shown in the Appendix. For the counterfactual subgraph autoencoder, we allow at most two CSMs to be applied on the same input graphs to avoid combinatorial complexity (if m CSMs are applicable to one input graph, m k combinations need to be evaluated). We set the latent space dimension as 64 and the dropout rate for the autoencoder to be 0.5. Please refer to more hyperparameters in Appendix D.1.2. [...] We set the α = 10, β = γ = 1 to emphasize the counterfactual s structural change. |