Attending to Graph Transformers

Authors: Luis Müller, Mikhail Galkin, Christopher Morris, Ladislav Rampášek

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we conduct an empirical study to complement our taxonomy in a separate direction. Concretely, we empirically evaluate two highly discussed aspects of graph transformers: (1) the effectiveness of incorporating graph structural bias into GTs, and (2) their ability to reduce over-smoothing and over-squashing.
Researcher Affiliation Collaboration Luis Müller EMAIL Department of Computer Science RWTH Aachen University Mikhail Galkin EMAIL Intel AI Lab Ladislav Rampášek EMAIL Isomorphic Labs Christopher Morris EMAIL Department of Computer Science RWTH Aachen University
Pseudocode No The paper describes various architectures and methodologies but does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in narrative text and mathematical equations.
Open Source Code Yes Our code is available at https://github.com/luis-mueller/probing-graph-transformers.
Open Datasets Yes We investigate this task using a custom dataset, Edges, derived from the Zinc (Dwivedi et al., 2023) dataset. For this task, we evaluate models on the Triangles dataset proposed by Knyazev et al. (2019)... We evaluate models on the CSL dataset (Dwivedi et al., 2023)... six heterophilic transductive datasets: Actor (Tang et al., 2009); Cornell, Texas, Wisconsin (CMU, 2001); Chameleon and Squirrel (Rozemberczki et al., 2021). In addition, we also consider the recently proposed heterophilic datasets in (Platonov et al., 2023)
Dataset Splits Yes The dataset specifies a fixed train/validation/test split, which we adopt in our experiments. ... We follow Dwivedi et al. (2023) in training with 5-fold cross-validation. ... For the six small datasets, we broadly follow the Set 1 hyper-parameters (Table 1). However, we perform a grid search for each model variant... For the five large datasets, we closely follow the hyper-parameter tuning and training setup described in (Platonov et al., 2023). ... Results over 10 splits, following (Platonov et al., 2023).
Hardware Specification Yes All experiments were run on a single A100 NVIDIA GPU with 80GB of GPU RAM.
Software Dependencies No The paper mentions basing its implementation on 'Graph GPS (Rampášek et al., 2022)' and using 'GELU non-linearity (Hendrycks & Gimpel, 2016)', but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes To simplify hyper-parameter selection, we hand-designed two general sets of hyperparameters; see Table 1. For Edges and Triangles, we fix a parameter budget of around 200k for the transformer models, resulting in six layers for each model with the respective embedding sizes specified in Table 1. Further, we train Graphormer on 1k epochs. ... For the six small datasets Actor, Cornell, Texas, Wisconsin, Chameleon and Squirrel, we select hyper-parameters with a grid search over the hidden dimension (32, 64, 96), dropout (0.0, 0.2, 0.5, 0.8) and, where applicable, attention dropout (0.0, 0.2, 0.5). For the five large datasets in Platonov et al. (2023), we follow their hyper-parameter selection exactly to enable a fair comparison. As a result, we only tune the number of layers (1,2,3,4,5).