Graph Inverse Style Transfer for Counterfactual Explainability
Authors: Bardh Prenkaj, Efstratios Zaradoukas, Gjergji Kasneci
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 8 benchmark datasets spanning synthetic and real-world graphs with binary and multiclass classification tasks emphasizes GIST as consistently outperforming So TA. Specifically, GIST achieves considerably higher validity (+7.6% over the second-best) and improves fidelity by a large margin (+45.5%). Our results highlight GIST s ability to generate counterfactuals that are both more faithful and spectrally aligned (preserved semantics) with the input. |
| Researcher Affiliation | Academia | 1Technical University of Munich, Germany 2Sapienza University of Rome, Italy. Correspondence to: Bardh Prenkaj <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Forward learning pass of GIST |
| Open Source Code | Yes | 4Code: https://github.com/bardhprenkaj/gist |
| Open Datasets | Yes | Extensive experiments on 8 benchmark datasets spanning synthetic and real-world graphs with binary and multiclass classification tasks emphasizes GIST as consistently outperforming So TA. Specifically, GIST achieves considerably higher validity (+7.6% over the second-best) and improves fidelity by a large margin (+45.5%). Our results highlight GIST s ability to generate counterfactuals that are both more faithful and spectrally aligned (preserved semantics) with the input. |
| Dataset Splits | Yes | We use a 90:10 train-test split for all explainers and designate 10% of the training set as validation. We perform 5-fold cross validations to assess the performances of the explainers on one AMD EPYC 7002/3 64-Core CPU (for smaller models) and one Nvidia TESLA V100 (for larger models) totaling 450h of execution time. |
| Hardware Specification | Yes | We perform 5-fold cross validations to assess the performances of the explainers on one AMD EPYC 7002/3 64-Core CPU (for smaller models) and one Nvidia TESLA V100 (for larger models) totaling 450h of execution time. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'RMS Propagation optimizer' with specific learning rates but does not provide specific version numbers for a comprehensive software stack (e.g., Python, PyTorch, CUDA, or other libraries). |
| Experiment Setup | Yes | For GIST we configured it to run the backtracking process for 50 epochs with a batch size of 16. We chose the number of attention heads to be equal to 2, the node embedding dimension to 16. We set α = 0.9 to encourage higher validity, which is beneficial for a helpful counterfactual. We train GIST with Adam optimizer with learning rate 10 3 and a weight decay of 10 5. For CF2 (Tan et al., 2022), we configured: 20 epochs, batch size ratio of 0.2, learning rate (lr) initialized at 0.02, and regularization parameters α = 0.7, λ = 20, and γ = 0.9. CF-GNNExp (Lucic et al., 2022) utilized: α = 0.01, K = 5, β = 0.6, and γ = 0.2. CLEAR (Ma et al., 2022) employed: 10 epochs, learning rate (lr) of 0.01, counterfactual loss regularization parameter (λcfe) set to 0.1, trade-off parameter α = 0.4, and batch size 32. RSGG-CE (Prado-Romero et al., 2024b) was trained for 500 epochs with a GAN configuration: batch size 1 and Top KPooling discriminator. Concerning the oracle implementation, we used the following hyperparameters: 50 epochs, batch size 32, and early stopping threshold 10 4. We trained the model using the RMS Propagation optimizer (learning rate lr = 0.01) with Cross Entropy loss. The architecture consisted of a Graph Convolutional Neural Network with 3 convolutional layers and 1 dense layer, convolutional booster 2, and linear decay factor 1.8. |