reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Beyond Message Passing: Neural Graph Pattern Machine

Authors: Zehong Wang, Zheyuan Zhang, Tianyi Ma, Nitesh V Chawla, Chuxu Zhang, Yanfang Ye

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations across four standard tasks node classification, link prediction, graph classification, and graph regression demonstrate that GPM outperforms state-of-the-art baselines. Further analysis reveals that GPM exhibits strong out-of-distribution generalization, desirable scalability, and enhanced interpretability. Our experimental results also show that GPM is robust to out-of-distribution issues, and can be readily scaled to large graphs, large model sizes, and distributed training.
Researcher Affiliation	Academia	1University of Notre Dame 2University of Connecticut. Correspondence to: Zehong Wang <EMAIL>, Yanfang Ye <EMAIL>.
Pseudocode	No	The paper describes the model architecture and methods in detail through text and diagrams (e.g., Figure 1, Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and datasets are available at: https: //github.com/Zehong-Wang/GPM.
Open Datasets	Yes	Code and datasets are available at: https: //github.com/Zehong-Wang/GPM. We conduct experiments on benchmark datasets of varying scales, with their statistics and homophily ratios summarized in Table 1. The datasets include Products, Computer, Arxiv, Wiki CS, Cora Full, Deezer, Blog, Flickr, and Flickr-S (Small). We evaluate the link prediction performance on three datasets: Cora, Pubmed, and ogbl-Collab. We evaluate the model on six graph datasets: social networks (IMDB-B, COLLAB, Reddit-M5K, Reddit-M12K) for classification and molecule graphs (ZINC and ZINC-Full) for regression.
Dataset Splits	Yes	We adopt the dataset splits from Chen et al. (2023) and Chen et al. (2024): public splits for Wiki CS, Flickr, Arxiv, and Products; 60/20/20 train/val/test split for Cora Full and Computer; 50/25/25 split for the remaining datasets. Following Guo et al. (2023), we split the edges into 80/5/15 train/val/test sets and use Hits@20 for Cora and Pubmed, and Hits@50 for ogbl-Collab for evaluation. We use 80/10/10 train/val/test splits for social networks, and the public splits for molecule graphs.
Hardware Specification	Yes	Most experiments are conducted on Linux servers equipped with four Nvidia A40 GPUs.
Software Dependencies	Yes	The models are implemented using Py Torch 2.4.0, Py Torch Geometric 2.6.1, and Py Torch Cluster 1.6.3, with CUDA 12.1 and Python 3.9.
Experiment Setup	Yes	In our setup, we use the Adam W optimizer with weight decay and apply early stopping after 100 epochs. Label smoothing is set to 0.05, and gradient clipping is fixed at 1.0 to stabilize training. The learning rate follows a warm-up schedule with 100 warm-up steps by default. The batch size is set to 256 by default. We perform hyperparameter search over the following ranges: learning rate {1e-2, 5e-3, 1e-3}, positional embedding dimension {4, 8, 20}, dropout {0.1, 0.3, 0.5}, weight decay {1e-2, 0}, and weighting coefficient λ {0.1, 0.5, 1.0}. For pattern sampling, we set p = 1, q = 1 by default (see Appendix G for details). The model configuration includes a hidden dimension of 256, 4 attention heads, and 1 transformer layer. The selected hyperparameters are summarized in Table 6.