Beyond Message Passing: Neural Graph Pattern Machine

Authors: Zehong Wang, Zheyuan Zhang, Tianyi Ma, Nitesh V Chawla, Chuxu Zhang, Yanfang Ye

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations across four standard tasks node classification, link prediction, graph classification, and graph regression demonstrate that GPM outperforms state-of-the-art baselines. Further analysis reveals that GPM exhibits strong out-of-distribution generalization, desirable scalability, and enhanced interpretability. Our experimental results also show that GPM is robust to out-of-distribution issues, and can be readily scaled to large graphs, large model sizes, and distributed training.
Researcher Affiliation Academia 1University of Notre Dame 2University of Connecticut. Correspondence to: Zehong Wang <EMAIL>, Yanfang Ye <EMAIL>.
Pseudocode No The paper describes the model architecture and methods in detail through text and diagrams (e.g., Figure 1, Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and datasets are available at: https: //github.com/Zehong-Wang/GPM.
Open Datasets Yes Code and datasets are available at: https: //github.com/Zehong-Wang/GPM. We conduct experiments on benchmark datasets of varying scales, with their statistics and homophily ratios summarized in Table 1. The datasets include Products, Computer, Arxiv, Wiki CS, Cora Full, Deezer, Blog, Flickr, and Flickr-S (Small). We evaluate the link prediction performance on three datasets: Cora, Pubmed, and ogbl-Collab. We evaluate the model on six graph datasets: social networks (IMDB-B, COLLAB, Reddit-M5K, Reddit-M12K) for classification and molecule graphs (ZINC and ZINC-Full) for regression.
Dataset Splits Yes We adopt the dataset splits from Chen et al. (2023) and Chen et al. (2024): public splits for Wiki CS, Flickr, Arxiv, and Products; 60/20/20 train/val/test split for Cora Full and Computer; 50/25/25 split for the remaining datasets. Following Guo et al. (2023), we split the edges into 80/5/15 train/val/test sets and use Hits@20 for Cora and Pubmed, and Hits@50 for ogbl-Collab for evaluation. We use 80/10/10 train/val/test splits for social networks, and the public splits for molecule graphs.
Hardware Specification Yes Most experiments are conducted on Linux servers equipped with four Nvidia A40 GPUs.
Software Dependencies Yes The models are implemented using Py Torch 2.4.0, Py Torch Geometric 2.6.1, and Py Torch Cluster 1.6.3, with CUDA 12.1 and Python 3.9.
Experiment Setup Yes In our setup, we use the Adam W optimizer with weight decay and apply early stopping after 100 epochs. Label smoothing is set to 0.05, and gradient clipping is fixed at 1.0 to stabilize training. The learning rate follows a warm-up schedule with 100 warm-up steps by default. The batch size is set to 256 by default. We perform hyperparameter search over the following ranges: learning rate {1e-2, 5e-3, 1e-3}, positional embedding dimension {4, 8, 20}, dropout {0.1, 0.3, 0.5}, weight decay {1e-2, 0}, and weighting coefficient λ {0.1, 0.5, 1.0}. For pattern sampling, we set p = 1, q = 1 by default (see Appendix G for details). The model configuration includes a hidden dimension of 256, 4 attention heads, and 1 transformer layer. The selected hyperparameters are summarized in Table 6.