reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accurate and Scalable Graph Neural Networks via Message Invariance

Authors: Zhihao Shi, Jie Wang, Zhiwei Zhuang, Xize Liang, Bin Li, Feng Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that TOP is significantly faster than existing mini-batch methods by order of magnitude on vast graphs (millions of nodes and billions of edges) with limited accuracy degradation. ... We conduct extensive experiments on graphs with various sizes to demonstrate that TOP is significantly faster than existing mini-batch methods with limited accuracy degradation (see Figures 3 and 5).
Researcher Affiliation	Academia	Zhihao Shi 1 , Jie Wang 1, Zhiwei Zhuang 1, Xize Liang 1, Bin Li 1, Feng Wu 1, 1 Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Pseudocode	Yes	1. Algorithm. We provide the pseudocode of TOP in Algorithms 1 and 2.
Open Source Code	Yes	3. Source Code. The code of TOP is available on Git Hub at https://github.com/MIRALab-USTC/TOP.
Open Datasets	Yes	We evaluate TOP on five datasets with various sizes (i.e., Reddit (Hamilton et al., 2017), Yelp (Zeng et al., 2020), Ogbn-arxiv, Ogbn-products, and Ogbn-papers (Hu et al., 2020)). These datasets contain at least 100 thousand nodes and one million edges. ... We also conduct experiments on heterophilous graphs in Appendix C.7. ... We conduct experiments on five heterophilous graphs (i.e., roman-empire, amazon-ratings, minesweeper, tolokers, and questions) provided by the recent heterophilous benchmark (Platonov et al., 2023)
Dataset Splits	Yes	Table 1: Statistics of the datasets in our experiments. ... Train/Val/Test Reddit 0.660/0.100/0.240 Yelp 0.750/0.150/0.100 Ogbn-arxiv 0.537/0.176/0.287 Ogbn-products 0.100/0.020/0.880 Ogbn-papers100M 0.780/0.080/0.140 ... Training, validation, and test node ratios are set at 0.8, 0.1, and 0.1, respectively.
Hardware Specification	Yes	We run all experiments in this section on a single Ge Force RTX 2080 Ti (11 GB), and Intel Xeon CPU E5-2640 v4. ... We run experiments in this section on a single A800 card.
Software Dependencies	No	The paper mentions implementing TOP, CLUSTER, SAINT, and GAS based on the codes and toolkits of GAS (Fey et al., 2021) and refers to the official implementation of LABOR (Balin & Catalyurek, 2023), but it does not specify explicit version numbers for any software libraries or frameworks like PyTorch or Python versions. The rule states that specific version numbers are required for reproducibility.
Experiment Setup	Yes	B.3 HYPERPARAMETERS Comparison with subgraph sampling. To ensure a fair comparison, we follow the GNN architectures, the data splits, training pipeline, and hyperparameters of GCN and PNA in (Fey et al., 2021). We search the best hyperparameters of GCNII, GAT, and SAGE for TOP, CLUSTER, and GAS in the same set. Comparison with node/layer-wise sampling. We run NS and LABOR by the official implementation3 of LABOR (Balin & Catalyurek, 2023) and corresponding hyperparameters. For TOP, we uniformly sample nodes to construct subgraphs. To ensure a fair comparison, TOP follows the data splits, training pipeline, learning rate, and hyperparameters of LABOR (Balin & Catalyurek, 2023). We adapt the batch size of TOP such that the memory consumption of TOP is similar to LABOR.