reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Manifold Perspective on the Statistical Generalization of Graph Neural Networks

Authors: Zhiyang Wang, Juan Cervino, Alejandro Ribeiro

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our generalization bounds of GNNs using synthetic and multiple real-world datasets. We provide extensive experiments both on synthetic and realworld datasets to verify our generalization conclusions.
Researcher Affiliation	Academia	1Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA 2Laboratory of Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, USA.
Pseudocode	No	The paper describes algorithms and methods in textual form but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly provide a link to source code, nor does it state that code is released or available in supplementary materials for the methodology described.
Open Datasets	Yes	We verify our theoretical results on both synthetic and real-world datasets. Specifically, in Section 5, the following datasets are considered: OGBN-Arxiv (Wang et al., 2020; Mikolov et al., 2013), Cora (Yang et al., 2016b), Cite Seer (Yang et al., 2016b), Pub Med (Yang et al., 2016b), Coauthors CS (Shchur et al., 2018), Coauthors Physics (Shchur et al., 2018), Amazon-rating (Platonov et al., 2023), and Roman-Empire (Platonov et al., 2023), details of the datasets can be found in Table 1. All datasets used in this paper are public, and free to use. They can be downloaded using the pytorch package (https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html), the ogb package (https://ogb.stanford.edu/docs/nodeprop/) and the Princeton Model Net project (https://modelnet.cs.princeton.edu/).
Dataset Splits	Yes	In all cases, we vary the number of nodes in the training set by partitioning it in {1, 2, 4, 8, 16, 32, 64, 32, 64, 128, 256, 512, 1024} partitions when possible. For the CS dataset, given that there are no training and testing sets, we randomly partitioned the datasets and used 90% of the samples for training and the remaining 10% for testing. For the Amazon dataset, we used the 10 different splits that the dataset has assigned.
Hardware Specification	Yes	All experiments were done using a NVIDIA Ge Force RTX 3090, and each set of experiments took at most 10 hours to complete.
Software Dependencies	No	The paper mentions using 'pytorch package' and 'ogb package' for downloading datasets, but does not specify version numbers for these or other software components used in their methodology.
Experiment Setup	Yes	We implement an ADAM optimizer with the learning rate set as 0.005 along with the forgetting factors 0.9 and 0.999. We carry out the training for 40 epochs with the size of batches set as 10. For the optimizer, we used Adam W, with using a learning rate of 0.01, and 0 weight decay. We trained using the graph convolutional layer, with a varying number of layers and hidden units. For dropout, we used 0.5. We trained using the cross-entropy loss.