reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph Knowledge Distillation to Mixture of Experts

Authors: Pavel Rumiantsev, Mark Coates

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments showing that our approach can be efficiently and effectively applied to datasets of various sizes. To evaluate our model, we explore both transductive and inductive settings for 9 publicly available datasets. We evaluate our model on nine real-world datasets. We show that our model can utilize additional parameters more efficiently than a parameter-inflated MLP, an ensemble of MLPs, or a vanilla mixture-of-experts model. We conduct an ablation study to show how the various loss terms influence accuracy.
Researcher Affiliation	Academia	Pavel Rumiantsev EMAIL The Department of Electrical and Computer Engineering Mc Gill University Mark Coates EMAIL The Department of Electrical and Computer Engineering Mc Gill University
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but it does not include a distinct section labeled "Pseudocode" or "Algorithm", nor does it present any formatted code blocks.
Open Source Code	Yes	Code available at https://github.com/Rufaim/ routing-by-memory.
Open Datasets	Yes	To conduct our experiments we use nine real-world datasets: Cora (Sen et al., 2008), Citeseer (Giles et al., 1998), Pubmed (Mc Callum et al., 2000), Amazon-Photo, Amazon-Computers, Academic CS, Academic-Physics (Shchur et al., 2018), OGB-Ar Xive and OGB-Products (Hu et al., 2020).
Dataset Splits	Yes	For the Cora, Citeseer, and Pubmed datasets, we follow the data splitting strategy specified by Kipf & Welling (2016). For the Amazon-Photo, Amazon-Computers, Academic-CS, Academic-Physics, we follow the procedure employed by Zhang et al. (2021b), Tian et al. (2022) and Wu et al. (2023). We randomly split the data into train/val/test subsets. Each random seed corresponds to a different data split. For the OGB-Ar Xive and OGB-Products we use the public data splits provided by Hu et al. (2020). For the inductive setting, we split the unlabeled nodes, VU, into a set of observed nodes, VU obs, and a set of inductive nodes, VU ind, by randomly selecting 20% of the nodes as the inductive subset, following the procedure of Tian et al. (2022) and Zhang et al. (2021b)
Hardware Specification	Yes	Our experiments were conducted using an NVIDIA Tesla V100 GPU with 32GB of memory. The machine has an Intel Xeon Gold 6140 CPU with clock frequency of 2.30GHz and total thread count of 36.
Software Dependencies	No	We use Ray Tune (Liaw et al., 2018) to tune model hyperparameters. Specifically, we use the Optuna search algorithm Akiba et al. (2019). We use the Adam optimizer (Kingma & Ba, 2014).
Experiment Setup	Yes	We tuned the following model structure hyperparameters: (i) dropout rate was selected from [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6] and applied to all dropout layers in the model; (ii) total number of experts was selected from [4, 5, 6, 7, 8]. In addition to the structure hyperparameters, we selected the following training hyperparameters: (i) learning rate for Adam optimizer (Kingma & Ba, 2014) was chosen from [0.01, 0.005, 0.001]; (ii) weight α of the commitment loss (6) from the range [0.0, 0.1]; (iii) weights β and γ of the the load-balancing loss (8) and self-similarity loss (7) correspondingly from the range [0.0, 0.05]. In our experiments, we set λ0 = 0.9, T = 200 and = 0.05.