reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment

Authors: Shuo Wang, Bokui Wang, Zhixiang Shen, Boyan Deng, Zhao Kang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both homophilic and heterophilic graph datasets validate the robustness and efficacy of our method. Our code is available at https://github. com/wbkzwqtzw/MDGFM. In this section, we evaluate our proposed MDGFM on few-shot node classification task.
Researcher Affiliation	Academia	1University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China. Correspondence to: Shuo Wang <EMAIL>, Zhao Kang <EMAIL>.
Pseudocode	No	The paper describes the methodology in text and mathematical formulas across sections 3 and 4, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Our code is available at https://github. com/wbkzwqtzw/MDGFM.
Open Datasets	Yes	To ensure a comprehensive comparison, we conduct experiments on six primary datasets, including three homophilic graphs Cora (Sen et al., 2008), Citeseer (Sen et al., 2008), and Pubmed (Namata et al., 2012) and three heterophilic graphs Cornell, Chameleon, and Squirrel (Pei et al., 2020). Additionally, we include Penn94 (Traud et al., 2012), a large-scale graph dataset, as a downstream target domain.
Dataset Splits	Yes	For both one-shot and few-shot classification tasks, we pretrain the models on five datasets and subsequently perform predictions on the remaining dataset, ensuring that the downstream domain remains unseen during the training phase. The detailed experimental setup is summarized in Table 7, where a checkmark ( ) indicates visibility during pre-training, while the absence of the mark denotes invisibility. ... For the downstream node classification tasks, we implement few-shot learning scenarios with 1-shot and 5-shot settings (3-shot for the Cornell and Squirrel datasets).
Hardware Specification	Yes	All experiments are conducted on a platform equipped with an Intel(R) Xeon(R) Gold 5220 CPU and an NVIDIA A800 80GB GPU, using Py Torch 1.10.1 and DGL 0.9.1.
Software Dependencies	Yes	All experiments are conducted on a platform equipped with an Intel(R) Xeon(R) Gold 5220 CPU and an NVIDIA A800 80GB GPU, using Py Torch 1.10.1 and DGL 0.9.1.
Experiment Setup	Yes	We employ the Adam optimizer and set the batch size to 128. During the upstream pre-training phase, we utilize Principal Component Analysis (PCA) to reduce the dimensionality of the initial features to 50 dimensions, thereby unifying the features from multiple source domains. For homogeneous graphs, we set k=30 for Graph Structure Learning (GSL), while for heterophilic graphs, we configure k=15. Additionally, we adjust the upstream and downstream learning rates across different datasets, as detailed in Table 8. We fix the number of graph neural layers to 3, with a hidden dimension of 256 for the GCN model. When dealing with homophilic graphs, the number of Epochs is set to 60, whereas it is set to 100 for heterophilic graphs.