AutoGFM: Automated Graph Foundation Model with Adaptive Architecture Customization

Authors: Haibo Chen, Xin Wang, Zeyang Zhang, Haoyang Li, Ling Feng, Wenwu Zhu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Auto GFM outperforms baselines, achieving state-of-the-art performance. The contributions of this paper are summarized as follows: We conduct extensive experiments on eight datasets to demonstrate the superiority of our method over state-of-the-art baselines.
Researcher Affiliation Academia 1Department of Computer Science and Technology, BNRIST, Tsinghua University, Beijing, China. Correspondence to: Xin Wang <EMAIL>, Wenwu Zhu <EMAIL>.
Pseudocode Yes Algorithm 1 Training pipeline for Auto GFM
Open Source Code No The paper mentions using and reproducing results from other methods' publicly available code, but does not provide a statement or link for the code of the Auto GFM methodology itself.
Open Datasets Yes Datasets We employ datasets with diverse domains and tasks. For node-level tasks, we utilize citation networks (Cora, Pubmed, and Arxiv) and the web link network (Wiki CS). For edge-level tasks, we utilize Knowledge Graphs (WN18RR, FB15K237). For graph-level tasks, we utilize molecular datasets (HIV, PCBA, and Ch EMBL). Following (Liu et al., 2023a), we use the textual encoder to unify the node features from different domains. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118 22133, 2020.
Dataset Splits Yes Dataset Splitting. We adopt the same splitting strategy as (Liu et al., 2023a; Wang et al., 2024b). For Cora and Pub Med select 20 labeled nodes per class for training. We utilize a predefined set of 10 splits with different random seeds to compute the average performance. For Wiki CS, we report the average accuracy over 20 distinct training splits, each generated with 20 different random seeds. In each split, 5% of the nodes from each class are used for training. For Arxiv, HIV, and PCBA, we employ the official dataset splits and conduct experiments 10 times using different random seeds to determine the average accuracy. The FB15K237 dataset consists of 272,115 edges in the training set, 17,535 edges in the validation set, and 20,466 edges in the test set. Meanwhile, for WN18RR, the corresponding numbers are 86,835, 3,034, and 3,134, respectively. Each experiment is repeated 10 times with different random seeds, and the final results are reported as the average accuracy.
Hardware Specification Yes GPU: NVIDIA A100-SXM4-40GB and NVIDIA A100-SXM4-80GB. CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz.
Software Dependencies Yes Software: Python 3.9, CUDA 12.2, Py Torch (Paszke et al., 2019) 1.13.1.
Experiment Setup Yes We evaluate different GNN architectures and GNAS methods based on GFT (Wang et al., 2024b), following the default hyperparameters of GFT to maintain consistency. To ensure a fair comparison, we set the dimensionality of all methods to 768, use the same search space and operations (GCN, GIN, GAT, Graph SAGE, Graph Conv), and fix the number of layers to 2. For our method, we explore hyperparameter λ, β {1e 1, 1e 2, 1e 3, 1e 4} and empirically select λ and β. The learning rate of the disentangled contrastive graph encoder is set to 5e 3, and the learning rate of the architecture predictor is set to 3e 2. The dimensionality of both the graph encoder and the supernet is 768. Each experiment is conducted 10 times, and we report the average performance along with standard deviations.