GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Authors: Qifang Zhao, Weidong Ren, Tianyu Li, Hong Liu, Xingsheng He, Xiaoxiao Xu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on OGB datasets demonstrate Graph GPT s superiority: it achieves SOTA results in graph- and edge-level tasks (e.g., molecular property prediction on PCQM4Mv2 and protein-protein interaction on ogbl-ppa), while delivering competitive performance in node-level tasks. ... 3.5. Ablation Study
Researcher Affiliation Industry 1Alibaba Inc., Hangzhou, China. Correspondence to: Qifang Zhao <EMAIL>, Xiaoxiao Xu <EMAIL>.
Pseudocode No The paper describes the
Open Source Code Yes To advance research in graph foundation models and facilitate scientific discovery in chemistry, materials science, and related fields, we have released the source code1 and model checkpoints2. 1https://github.com/alibaba/graph-gpt
Open Datasets Yes To demonstrate its versatility across graph tasks, we select benchmarks for graph-, edge-, and node-level objectives: Graph-level: PCQM4Mv2 (quantum chemistry), ogbgmolpcba (molecular property prediction) and Triangles (triangles counting). Edge-level: ogbl-ppa (protein-protein associations) and ogbl-citation2 (citation networks). Node-level: ogbn-proteins (protein interaction networks) and ogbn-arxiv (paper categorization).
Dataset Splits Yes To evaluate Graph GPT s ability to learn structural patterns through generative pre-training, we use the Triangles dataset with the task of counting triangles. The dataset is split into: 1). Training/Validation: 30k and 5k small graphs (≤ 25 nodes); 2). Testing: 5k small graphs (Test-small) and 5k large graphs (25–100 nodes, Test-large).
Hardware Specification Yes The models are pre-trained and fine-tuned on A800-80G GPU clusters5 using Deep Speed s Stage-2 strategy with mixed precision (FP16/FP32) or BF16 (Rasley et al., 2020). We employ the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate scheduler. ... Table 18. Computational cost details of the main datasets in the paper. PT means pre-training and FT stands for fine-tuning. Time is measured in hours. The model size is Base as in Tab. 11 with number of parameters about 110M. The corresponding hyper-parameters can be found in Tab. 13, 14, 15, 16. dataset model size PT time FT time GPU-PT GPU-FT ogbl-ppa B 58.73 h 112.62 h 8 Nvidia L20 16 V100-32G
Software Dependencies No The implementation uses Py Torch as the primary framework. For graph preprocessing tasks such as subgraph sampling, we utilize torch-geometric (Fey & Lenssen, 2019). When required, we employ Network X (Hagberg et al., 2008) to Eulerize (sub)graphs and identify (semi-)Eulerian paths. ... We employ a transformer architecture based on Llama (Touvron et al., 2023), implemented via the Hugging Face Transformers library (Wolf et al., 2020).
Experiment Setup Yes Table 12. Pre-train and fine-tune configurations for the PCQM4M-v2 dataset. LSI means layer-scale-initialization, EMA is exponential moving average, MPE stands for max-position-embedding, and TWE means tie-word-embeddings. ... Table 13. The Pre-training and fine-tuning configurations for the ogbl-ppa dataset. ... Table 14. Pre-train and fine-tune configurations for the ogbl-citation2 dataset. ... Table 15. Configurations of pre-training with SMTP and fine-tuning for the ogbn-proteins dataset. ... Table 16. Configurations of pre-training with SMTP and fine-tuning for the ogbn-arxiv dataset.