Where Did the Gap Go? Reassessing the Long-Range Graph Benchmark

Authors: Jan Tönshoff, Martin Ritzert, Eran Rosenbluth, Martin Grohe

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we carefully reevaluate multiple MPGNN baselines as well as the Graph Transformer GPS (Rampáek et al. 2022) on LRGB. Through a rigorous empirical analysis, we demonstrate that the reported performance gap is overestimated due to suboptimal hyperparameter choices. It is noteworthy that across multiple datasets the performance gap completely vanishes after basic hyperparameter optimization. In addition, we discuss the impact of lacking feature normalization for LRGB s vision datasets and highlight a spurious implementation of LRGB s link prediction metric. The principal aim of our paper is to establish a higher standard of empirical rigor within the graph machine learning community.
Researcher Affiliation Academia Jan Tönshoff toenshoff@informatik.rwth-aachen.de RWTH Aachen University Martin Ritzert EMAIL Georg-August-Universität Göttingen Eran Rosenbluth EMAIL RWTH Aachen University Martin Grohe EMAIL RWTH Aachen University
Pseudocode No The paper describes methods and experiments in detail but does not include any clearly labeled pseudocode or algorithm blocks. The procedural descriptions are all in paragraph form.
Open Source Code Yes Our contribution is three-fold1: First, we show that the three MPGNN baselines GCN, GINE, and Gated GCN all profit massively from further hyperparameter tuning, reducing and even closing the gap to graph transformers on multiple datasets. ... 1Source code: https://github.com/toenshoff/LRGB
Open Datasets Yes The recent Long-Range Graph Benchmark (LRGB, Dwivedi et al. 2022) introduced a set of graph learning tasks strongly dependent on long-range interaction between vertices. ... The Long-Range Graph Benchmark (LRGB) has been introduced by Dwivedi et al. (2022) as a collection of five datasets: Peptides-func and Peptides-struct are graph-level classification and regression tasks, respectively. ... Pascal VOC-SP and COCO-SP model semantic image segmentation as a node-classification task on superpixel graphs. PCQM-Contact is a link prediction task on molecular graphs.
Dataset Splits Yes Table 1a provides the results obtained on the test splits of the Peptides-Func and Peptides-Struct. ... For the final evaluations runs we average results across four different random seeds as specified by the LRGB dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments. It focuses on the experimental setup and results but omits hardware specifications.
Software Dependencies No The paper mentions using an "Adam W optimizer" and "Ge LU Hendrycks & Gimpel (2016) as our default activation function" but does not specify version numbers for programming languages, machine learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries.
Experiment Setup Yes We tune the main hyperparameters (such as depth, dropout rate, . . . ) in pre-defined ranges while strictly adhering to the official 500k parameter budget. The exact hyperparameter ranges and all final configurations are provided in Appendix A.1. In particular, we looked at networks with 6 to 10 layers, varied the number of layers in the prediction head from 1 to 3 (which turned out to be very relevant), and also considered the dropout and learning rate of the network. ... Overall, we tried to incorporate the most important hyperparameters which we selected to be dropout, model depth, prediction head depth, learning rate, and the used positional or structural encoding. For GPS we additionally evaluated the internal MPGNN (but only between GCN and Gated GCN) and whether to use Batch Norm or Layer Norm. Thus, our hyperparamters and ranges were as follows: Dropout [0, 0.1, 0.2], default 0.1; Depth [6,8,10], default 8. ... learning rate [0.001, 0.0005, 0.0001], default 0.001; Head depth [1,2,3], default 2; Encoding [none, Lap PE, RWSE] default none; Internal MPGNN [GCN, Gated GCN], default Gated GCN (only for GPS); Normalization [Batch Norm, Layer Norm] default Batch Norm (only for GPS).