reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Scalable and Effective Alternative to Graph Transformers

Authors: Kaan Sancak, Zhigang Hua, Jin Fang, Yan Xie, Andrey Malevich, Bo Long, Muhammed Fatih Balin, Ümit V. Çatalyürek

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study on synthetic datasets reveals that GECO reaches 169 speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that GECO scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning. 4 Experiments
Researcher Affiliation	Collaboration	1School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA 2Meta AI
Pseudocode	Yes	Algorithm 1: Forward pass of GCB Operator; Algorithm 2: End-to-end GECO Model Training
Open Source Code	Yes	Code https://github.com/kaansancak/GECO
Open Datasets	Yes	Long Range Graph Benchmark (LRGB). Table 1 presents our evaluation on the LRGB, a collection of graph tasks designed to test a model s ability to capture long-range dependencies. PCQM4Mv2. Table 2 shows that GECO outperforms both GNN and GT baselines on PCQM4Mv2 in prediction quality. Table 3: Accuracy on large node prediction datasets: the first, second, and third are highlighted. We reuse the results from (Han et al. 2023; Shirzad et al. 2023; Zeng et al. 2021), and run Exphormer locally except Arxiv.
Dataset Splits	No	The paper mentions using a 'validation set' for evaluation on PCQM4Mv2 and states 'For dataset and hyperparameter details please refer to the extended version.' in Section 4.1. It does not explicitly provide specific dataset split information (percentages or counts) in the main text.
Hardware Specification	Yes	Even the most computation-intensive GTs, such as Graphormer, can be trained on these datasets using Nvidia-V100 (32GB) or Nvidia-A100 (40GB) GPUs (Ying et al. 2021; Rampasek et al. 2022).
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries, or frameworks used in their implementation.
Experiment Setup	No	For dataset and hyperparameter details please refer to the extended version.