reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cooperative Minibatching in Graph Neural Networks

Authors: Muhammed Fatih Balin, Dominique LaSalle, Umit Catalyurek

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems, using same resources. ... 4 Experimental Evaluation We first compare how the work to process an epoch changes w.r.t. to the batch size to empirically validate Theorems 3.1 and 3.2 for different graph sampling algorithms. Next, we show how dependent batches introduced in Section 3.2 benefits GNN training. We also show the runtime benefits of cooperative minibatching compared to independent minibatching in the multi-GPU setting. Finally, we show that these two techniques are orthogonal, can be combined to get multiplicative savings.
Researcher Affiliation	Collaboration	Muhammed Fatih Balın EMAIL School of Computational Science and Engineering Georgia Institute of Technology, Atlanta, GA, USA Dominique La Salle EMAIL NVIDIA Corporation, Santa Clara, CA, USA Ümit V. Çatalyürek EMAIL School of Computational Science and Engineering Georgia Institute of Technology, Atlanta, GA, USA
Pseudocode	Yes	Algorithm 1 Cooperative minibatching Input: seed vertices S0 p for each PE p P, # layers L for all l {0, . . . , L 1} do {Sampling} for all p P do in parallel Sample next layer vertices Sl+1 p and edges El p for Sl p all-to-all to redistribute vertex ids for Sl+1 p to get Sl+1 p for all p P do in parallel {Feature Loading}
Open Source Code	Yes	Source code is available at https://github.com/GT-TDAlab/dgl-coop/tree/dist_graph_squashed_wip_cache
Open Datasets	Yes	In our experiments, we use the following datasets: reddit (Hamilton et al., 2017), papers100M (Hu et al., 2020a), mag240M (Hu et al., 2021), yelp and flickr (Zeng et al., 2020), and their details are given in Table 2.
Dataset Splits	Yes	Table 2: Traits of datasets used in experiments: numbers of vertices, edges, avg. degree, features, cached vertex embeddings, and training, validation and test vertex splits. Last column has # minibatches in an epoch during model training with 1024 batch size including validation. ... flickr 89.2K 900K 10.09 500 70k 50.00 25.00 25.00 65 ... yelp ... 75.00 10.00 15.00 ... reddit ... 66.00 10.00 24.00 ... papers100M ... 1.09 0.11 0.19 ... mag240M ... 0.45 0.06 0.04
Hardware Specification	Yes	We present our runtime results on systems equipped with NVIDIA GPUs, with 4 and 8 A100 80 GB (NVIDIA, 2021) and 16 V100 32GB (NVIDIA, 2020b), all with NVLink interconnect between the GPUs (600 GB/s for A100 and 300 GB/s for V100).
Software Dependencies	No	We implemented our experimental code using C++ and Python in the DGL framework (Wang et al., 2019) with the Pytorch backend (Paszke et al., 2019). No specific version numbers for C++, Python, DGL, or Pytorch are provided.
Experiment Setup	Yes	All our experiments involve a GCN model with L = 3 layers (Hamilton et al., 2017), with 1024 hidden dimension for mag240M and papers100M and 256 for the rest. Additionally, papers100M and mag240M datasets were made undirected graphs for all experiments and this is reflected in the reported edge counts in Table 2. Input features of mag240M are stored with the 16-bit floating point type. We use the Adam optimizer (Kingma & Ba, 2014) with 10 3 learning rate in all the experiments. ... We used a fanout of k = 10 for the samplers. In addition, Random Walks used length of o = 3, restart probability p = 0.5 and number of random walks from each seed a = 100.