reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

Authors: Taraneh Younesian, Daniel Daza, Emile van Krieken, Thiviyan Thanapalasingam, Peter Bloem

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate GRAPES on various node classification benchmarks involving homophilous as well as heterophilous graphs. We demonstrate GRAPES effectiveness in accuracy and scalability, particularly in multi-label heterophilous graphs. Our experiments aim at answering the following research question: given a fixed sampling budget and GNN architecture, what is the effect of training with an adaptive policy for layer-wise sampling, in comparison with the related work?
Researcher Affiliation	Academia	Taraneh Younesian EMAIL Vrije Universiteit Amsterdam Daniel Daza EMAIL Amsterdam UMC Emile van Krieken EMAIL University of Edinburgh Thiviyan Thanapalasingam EMAIL University of Amsterdam Peter Bloem EMAIL Vrije Universiteit Amsterdam
Pseudocode	Yes	Algorithm 1 One GRAPES epoch
Open Source Code	Yes	Our implementation is publicly available online.1 1Available at https://github.com/dfdazac/grapes.
Open Datasets	Yes	Homophilous graphs: citation networks (Cora, Citeseer, Pubmed with the full split) (Sen et al., 2008; Yang et al., 2016), Reddit (Hamilton et al., 2017), ogbn-arxiv and ogbn-products (Hu et al., 2020), and DBLP (Zhao et al., 2023). Heterophilous graphs: Flickr (Zeng et al., 2019), Yelp (Zeng et al., 2019), ogbn-proteins (Hu et al., 2020), Blog Cat (Zhao et al., 2023), and snap-patents (Leskovec & Krevl, 2014).
Dataset Splits	Yes	For Cora, Citeseer, and Pubmed correspond to the full splits, in which the label rate is higher than in the public splits. For Blog Cat, we take the average accuracy of all the methods across the three available splits provided by (Zhao et al., 2023). For DBLP and snap-patents, we use the average of ten random splits because these two datasets had no predefined splits.
Hardware Specification	Yes	We conducted our experiments on a machine with Nvidia RTX A4000 GPU (16GB GPU memory), Nvidia A100 (40GB GPU memory), and Nvidia RTX A6000 GPU (48GB GPU memory) and each machine had 48 CPUs.
Software Dependencies	No	We implemented the GCNs in GRAPES via PyTorch Geometric (Fey & Lenssen, 2019). We used the Adam optimizer (Kingma & Ba, 2014) for GCNC and GCNS. The paper mentions software like PyTorch Geometric and Adam optimizer with citations, but does not provide specific version numbers (e.g., PyTorch Geometric 1.x or Adam Optimizer 2.x).
Experiment Setup	Yes	For all experiments, we used as architecture the Graph Convolutional Network (Kipf & Welling, 2016), with two layers, a hidden size of 256, a batch size of 256, and a sampling size of 256 nodes per layer. We train for 50 epochs on Cora, Citeseer, and Reddit; 100 epochs on Blog Cat, DBLP, Flickr, ogbn-products, Pubmed, snap-patents, and Yelp; and 150 epochs on ogbn-arxiv and ogbn-proteins. The following are the hyperparameters that we tuned: the learning rate of the GFlow Net, the learning rate of the classification GCN, and the scaling parameter α. We used the log uniform distribution to sample the aforementioned hyperparameters with the values from the following ranges, respectively, [1e 6, 1e 2], [1e 6, 1e 2], and [1e2, 1e6].