Graph Inverse Style Transfer for Counterfactual Explainability

Authors: Bardh Prenkaj, Efstratios Zaradoukas, Gjergji Kasneci

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 8 benchmark datasets spanning synthetic and real-world graphs with binary and multiclass classification tasks emphasizes GIST as consistently outperforming So TA. Specifically, GIST achieves considerably higher validity (+7.6% over the second-best) and improves fidelity by a large margin (+45.5%). Our results highlight GIST s ability to generate counterfactuals that are both more faithful and spectrally aligned (preserved semantics) with the input.
Researcher Affiliation Academia 1Technical University of Munich, Germany 2Sapienza University of Rome, Italy. Correspondence to: Bardh Prenkaj <EMAIL>.
Pseudocode Yes Algorithm 1 Forward learning pass of GIST
Open Source Code Yes 4Code: https://github.com/bardhprenkaj/gist
Open Datasets Yes Extensive experiments on 8 benchmark datasets spanning synthetic and real-world graphs with binary and multiclass classification tasks emphasizes GIST as consistently outperforming So TA. Specifically, GIST achieves considerably higher validity (+7.6% over the second-best) and improves fidelity by a large margin (+45.5%). Our results highlight GIST s ability to generate counterfactuals that are both more faithful and spectrally aligned (preserved semantics) with the input.
Dataset Splits Yes We use a 90:10 train-test split for all explainers and designate 10% of the training set as validation. We perform 5-fold cross validations to assess the performances of the explainers on one AMD EPYC 7002/3 64-Core CPU (for smaller models) and one Nvidia TESLA V100 (for larger models) totaling 450h of execution time.
Hardware Specification Yes We perform 5-fold cross validations to assess the performances of the explainers on one AMD EPYC 7002/3 64-Core CPU (for smaller models) and one Nvidia TESLA V100 (for larger models) totaling 450h of execution time.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'RMS Propagation optimizer' with specific learning rates but does not provide specific version numbers for a comprehensive software stack (e.g., Python, PyTorch, CUDA, or other libraries).
Experiment Setup Yes For GIST we configured it to run the backtracking process for 50 epochs with a batch size of 16. We chose the number of attention heads to be equal to 2, the node embedding dimension to 16. We set α = 0.9 to encourage higher validity, which is beneficial for a helpful counterfactual. We train GIST with Adam optimizer with learning rate 10 3 and a weight decay of 10 5. For CF2 (Tan et al., 2022), we configured: 20 epochs, batch size ratio of 0.2, learning rate (lr) initialized at 0.02, and regularization parameters α = 0.7, λ = 20, and γ = 0.9. CF-GNNExp (Lucic et al., 2022) utilized: α = 0.01, K = 5, β = 0.6, and γ = 0.2. CLEAR (Ma et al., 2022) employed: 10 epochs, learning rate (lr) of 0.01, counterfactual loss regularization parameter (λcfe) set to 0.1, trade-off parameter α = 0.4, and batch size 32. RSGG-CE (Prado-Romero et al., 2024b) was trained for 500 epochs with a GAN configuration: batch size 1 and Top KPooling discriminator. Concerning the oracle implementation, we used the following hyperparameters: 50 epochs, batch size 32, and early stopping threshold 10 4. We trained the model using the RMS Propagation optimizer (learning rate lr = 0.01) with Cross Entropy loss. The architecture consisted of a Graph Convolutional Neural Network with 3 convolutional layers and 1 dense layer, convolutional booster 2, and linear decay factor 1.8.