Adversarial Robustness of Graph Transformers
Authors: Philipp Foth, Lukas Gosch, Simon Geisler, Leo Schwinn, Stephan Günnemann
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our attacks on multiple tasks and perturbation models, including structure perturbations for node and graph classification, and node injection for graph classification. Our results reveal that GTs can be catastrophically fragile in many cases. Addressing this vulnerability, we show how our adaptive attacks can be effectively used for adversarial training, substantially improving robustness. |
| Researcher Affiliation | Academia | Philipp Foth EMAIL School of Computation, Information and Technology Technical University of Munich |
| Pseudocode | Yes | Algorithm 1 Our k-step free adversarial training |
| Open Source Code | Yes | The code to reproduce our results can be found at https://github.com/isefos/gt_robustness. |
| Open Datasets | Yes | We first evaluate our structure attacks on CLUSTER (Dwivedi et al., 2023) [...] We also consider the graph classification dataset Reddit Threads (Rozemberczki et al., 2020). [...] we evaluate on the UPFD fake news detection datasets (Dou et al., 2021). |
| Dataset Splits | Yes | We used the standard Py G train/val/test split of 83.3/8.3/8.3% graphs. The binary graph classification dataset Reddit Threads (Rozemberczki et al., 2020) contains 203 088 graphs with an average of 23.9 nodes. We used a stratified random split of 75/12.5/12.5%. The binary graph classification dataset UPFD gossipcop (Dou et al., 2021) contains 5464 graphs with an average of 58 nodes. We use the standard Py G split of 20/10/70%. The binary graph classification dataset UPFD politifact (Dou et al., 2021) contains 314 graphs with an average of 131 nodes. We use the standard Py G split of 20/10/70%. |
| Hardware Specification | No | No specific hardware details such as CPU/GPU models or memory specifications are provided in the paper. |
| Software Dependencies | No | The paper mentions PyTorch 2.7.1 in a theoretical discussion about the ReLU function's differentiability, and Py G in relation to dataset splits, but does not provide a comprehensive list of software dependencies with specific version numbers used for their experimental setup. |
| Experiment Setup | Yes | To obtain trained models of comparable performance for each architecture type, we performed a hyperparameter search for each model and dataset. [...] The final hyperparameters of the best models used for the robustness results are shown for Graphormer in Tab. 4, for SAN in Tab. 5, for GRIT in Tab. 6, for Polynormer in Tab. 8, for GPS in Tab. 7, for GCN in Tab. 9, for GPS-GCN in Tab. 10, for GAT in Tab. 11, and for GATv2 in Tab. 12. |