GraphGPT: Generative Pre-trained Graph Eulerian Transformer
Authors: Qifang Zhao, Weidong Ren, Tianyu Li, Hong Liu, Xingsheng He, Xiaoxiao Xu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on OGB datasets demonstrate Graph GPT s superiority: it achieves SOTA results in graph- and edge-level tasks (e.g., molecular property prediction on PCQM4Mv2 and protein-protein interaction on ogbl-ppa), while delivering competitive performance in node-level tasks. ... 3.5. Ablation Study |
| Researcher Affiliation | Industry | 1Alibaba Inc., Hangzhou, China. Correspondence to: Qifang Zhao <EMAIL>, Xiaoxiao Xu <EMAIL>. |
| Pseudocode | No | The paper describes the |
| Open Source Code | Yes | To advance research in graph foundation models and facilitate scientific discovery in chemistry, materials science, and related fields, we have released the source code1 and model checkpoints2. 1https://github.com/alibaba/graph-gpt |
| Open Datasets | Yes | To demonstrate its versatility across graph tasks, we select benchmarks for graph-, edge-, and node-level objectives: Graph-level: PCQM4Mv2 (quantum chemistry), ogbgmolpcba (molecular property prediction) and Triangles (triangles counting). Edge-level: ogbl-ppa (protein-protein associations) and ogbl-citation2 (citation networks). Node-level: ogbn-proteins (protein interaction networks) and ogbn-arxiv (paper categorization). |
| Dataset Splits | Yes | To evaluate Graph GPT s ability to learn structural patterns through generative pre-training, we use the Triangles dataset with the task of counting triangles. The dataset is split into: 1). Training/Validation: 30k and 5k small graphs (≤ 25 nodes); 2). Testing: 5k small graphs (Test-small) and 5k large graphs (25–100 nodes, Test-large). |
| Hardware Specification | Yes | The models are pre-trained and fine-tuned on A800-80G GPU clusters5 using Deep Speed s Stage-2 strategy with mixed precision (FP16/FP32) or BF16 (Rasley et al., 2020). We employ the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate scheduler. ... Table 18. Computational cost details of the main datasets in the paper. PT means pre-training and FT stands for fine-tuning. Time is measured in hours. The model size is Base as in Tab. 11 with number of parameters about 110M. The corresponding hyper-parameters can be found in Tab. 13, 14, 15, 16. dataset model size PT time FT time GPU-PT GPU-FT ogbl-ppa B 58.73 h 112.62 h 8 Nvidia L20 16 V100-32G |
| Software Dependencies | No | The implementation uses Py Torch as the primary framework. For graph preprocessing tasks such as subgraph sampling, we utilize torch-geometric (Fey & Lenssen, 2019). When required, we employ Network X (Hagberg et al., 2008) to Eulerize (sub)graphs and identify (semi-)Eulerian paths. ... We employ a transformer architecture based on Llama (Touvron et al., 2023), implemented via the Hugging Face Transformers library (Wolf et al., 2020). |
| Experiment Setup | Yes | Table 12. Pre-train and fine-tune configurations for the PCQM4M-v2 dataset. LSI means layer-scale-initialization, EMA is exponential moving average, MPE stands for max-position-embedding, and TWE means tie-word-embeddings. ... Table 13. The Pre-training and fine-tuning configurations for the ogbl-ppa dataset. ... Table 14. Pre-train and fine-tune configurations for the ogbl-citation2 dataset. ... Table 15. Configurations of pre-training with SMTP and fine-tuning for the ogbn-proteins dataset. ... Table 16. Configurations of pre-training with SMTP and fine-tuning for the ogbn-arxiv dataset. |