reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Machine Learning on Graphs: A Model and Comprehensive Taxonomy

Authors: Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a comprehensive taxonomy of GRL methods, aiming to unify several disparate bodies of work. Speciﬁcally, we propose the Graph EDM framework, which generalizes popular algorithms for semi-supervised learning (e.g. Graph Sage, GCN, GAT), and unsupervised learning (e.g. Deep Walk, node2vec) of graph representations into a single consistent approach. To illustrate the generality of Graph EDM, we ﬁt over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area.
Researcher Affiliation	Collaboration	Ines Chami EMAIL Stanford University Stanford, CA, 94305, USA Sami Abu-El-Haija EMAIL USC Information Sciences Institute Marina Del Rey, CA, 90292, USA Bryan Perozzi EMAIL Google Research New York, NY, 10011, USA Christopher R e EMAIL Stanford University Stanford, CA, 94305, USA Kevin Murphy EMAIL Google Research Mountain View, CA, 94043, USA
Pseudocode	No	The paper describes methods and frameworks for machine learning on graphs using prose and mathematical equations but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We release an open-source library for GRL which includes state-of-the-art GRL methods and important graph applications, including node classiﬁcation and link prediction. Our implementation is publicly available at https://github.com/google/ gcnn-survey-paper.
Open Datasets	Yes	Recently, there has been progress in this direction, including new graph benchmarks with leaderboards (Hu et al., 2020; Dwivedi et al., 2020) and graph embedding libraries (Fey and Lenssen, 2019; Wang et al., 2019; Goyal and Ferrara, 2018a).
Dataset Splits	No	The paper is a survey and does not report original experimental results requiring specific dataset splits. It discusses the importance of consistent evaluation, noting that 'citation benchmarks have drawbacks since results might signiﬁcantly vary based on datasets splits, or training procedures (e.g. early stopping), as shown in recent work (Shchur et al., 2018).'
Hardware Specification	No	The paper is a survey and does not report original experimental results, therefore no specific hardware is mentioned for its own experiments. It includes general discussions about scaling to large graphs and the computational resources required for them (e.g., 'Distributed Systems setup with many machines, such as Map Reduce').
Software Dependencies	No	The paper describes a taxonomy and framework for graph machine learning methods and refers to an open-source library they released. However, it does not explicitly list specific software dependencies with version numbers for reproducing its own work within the text.
Experiment Setup	No	The paper is a survey and taxonomy, and as such, it does not present original experimental results that would require a detailed experimental setup or hyperparameter specifications for reproduction. It discusses hyperparameters in the context of existing methods, such as 'Deep Walk and node2vec sampling strategies use hyper-parameters to control this, such as the length of the walk or ratio between breadth and depth exploration'.