GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks
Authors: Taraneh Younesian, Daniel Daza, Emile van Krieken, Thiviyan Thanapalasingam, Peter Bloem
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate GRAPES on various node classification benchmarks involving homophilous as well as heterophilous graphs. We demonstrate GRAPES effectiveness in accuracy and scalability, particularly in multi-label heterophilous graphs. Our experiments aim at answering the following research question: given a fixed sampling budget and GNN architecture, what is the effect of training with an adaptive policy for layer-wise sampling, in comparison with the related work? |
| Researcher Affiliation | Academia | Taraneh Younesian EMAIL Vrije Universiteit Amsterdam Daniel Daza EMAIL Amsterdam UMC Emile van Krieken EMAIL University of Edinburgh Thiviyan Thanapalasingam EMAIL University of Amsterdam Peter Bloem EMAIL Vrije Universiteit Amsterdam |
| Pseudocode | Yes | Algorithm 1 One GRAPES epoch |
| Open Source Code | Yes | Our implementation is publicly available online.1 1Available at https://github.com/dfdazac/grapes. |
| Open Datasets | Yes | Homophilous graphs: citation networks (Cora, Citeseer, Pubmed with the full split) (Sen et al., 2008; Yang et al., 2016), Reddit (Hamilton et al., 2017), ogbn-arxiv and ogbn-products (Hu et al., 2020), and DBLP (Zhao et al., 2023). Heterophilous graphs: Flickr (Zeng et al., 2019), Yelp (Zeng et al., 2019), ogbn-proteins (Hu et al., 2020), Blog Cat (Zhao et al., 2023), and snap-patents (Leskovec & Krevl, 2014). |
| Dataset Splits | Yes | For Cora, Citeseer, and Pubmed correspond to the full splits, in which the label rate is higher than in the public splits. For Blog Cat, we take the average accuracy of all the methods across the three available splits provided by (Zhao et al., 2023). For DBLP and snap-patents, we use the average of ten random splits because these two datasets had no predefined splits. |
| Hardware Specification | Yes | We conducted our experiments on a machine with Nvidia RTX A4000 GPU (16GB GPU memory), Nvidia A100 (40GB GPU memory), and Nvidia RTX A6000 GPU (48GB GPU memory) and each machine had 48 CPUs. |
| Software Dependencies | No | We implemented the GCNs in GRAPES via PyTorch Geometric (Fey & Lenssen, 2019). We used the Adam optimizer (Kingma & Ba, 2014) for GCNC and GCNS. The paper mentions software like PyTorch Geometric and Adam optimizer with citations, but does not provide specific version numbers (e.g., PyTorch Geometric 1.x or Adam Optimizer 2.x). |
| Experiment Setup | Yes | For all experiments, we used as architecture the Graph Convolutional Network (Kipf & Welling, 2016), with two layers, a hidden size of 256, a batch size of 256, and a sampling size of 256 nodes per layer. We train for 50 epochs on Cora, Citeseer, and Reddit; 100 epochs on Blog Cat, DBLP, Flickr, ogbn-products, Pubmed, snap-patents, and Yelp; and 150 epochs on ogbn-arxiv and ogbn-proteins. The following are the hyperparameters that we tuned: the learning rate of the GFlow Net, the learning rate of the classification GCN, and the scaling parameter α. We used the log uniform distribution to sample the aforementioned hyperparameters with the values from the following ranges, respectively, [1e 6, 1e 2], [1e 6, 1e 2], and [1e2, 1e6]. |