Adversarial Robustness of Graph Transformers

Authors: Philipp Foth, Lukas Gosch, Simon Geisler, Leo Schwinn, Stephan Günnemann

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our attacks on multiple tasks and perturbation models, including structure perturbations for node and graph classification, and node injection for graph classification. Our results reveal that GTs can be catastrophically fragile in many cases. Addressing this vulnerability, we show how our adaptive attacks can be effectively used for adversarial training, substantially improving robustness.
Researcher Affiliation Academia Philipp Foth EMAIL School of Computation, Information and Technology Technical University of Munich
Pseudocode Yes Algorithm 1 Our k-step free adversarial training
Open Source Code Yes The code to reproduce our results can be found at https://github.com/isefos/gt_robustness.
Open Datasets Yes We first evaluate our structure attacks on CLUSTER (Dwivedi et al., 2023) [...] We also consider the graph classification dataset Reddit Threads (Rozemberczki et al., 2020). [...] we evaluate on the UPFD fake news detection datasets (Dou et al., 2021).
Dataset Splits Yes We used the standard Py G train/val/test split of 83.3/8.3/8.3% graphs. The binary graph classification dataset Reddit Threads (Rozemberczki et al., 2020) contains 203 088 graphs with an average of 23.9 nodes. We used a stratified random split of 75/12.5/12.5%. The binary graph classification dataset UPFD gossipcop (Dou et al., 2021) contains 5464 graphs with an average of 58 nodes. We use the standard Py G split of 20/10/70%. The binary graph classification dataset UPFD politifact (Dou et al., 2021) contains 314 graphs with an average of 131 nodes. We use the standard Py G split of 20/10/70%.
Hardware Specification No No specific hardware details such as CPU/GPU models or memory specifications are provided in the paper.
Software Dependencies No The paper mentions PyTorch 2.7.1 in a theoretical discussion about the ReLU function's differentiability, and Py G in relation to dataset splits, but does not provide a comprehensive list of software dependencies with specific version numbers used for their experimental setup.
Experiment Setup Yes To obtain trained models of comparable performance for each architecture type, we performed a hyperparameter search for each model and dataset. [...] The final hyperparameters of the best models used for the robustness results are shown for Graphormer in Tab. 4, for SAN in Tab. 5, for GRIT in Tab. 6, for Polynormer in Tab. 8, for GPS in Tab. 7, for GCN in Tab. 9, for GPS-GCN in Tab. 10, for GAT in Tab. 11, and for GATv2 in Tab. 12.