Machine Learning on Graphs: A Model and Comprehensive Taxonomy

Authors: Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a comprehensive taxonomy of GRL methods, aiming to unify several disparate bodies of work. Specifically, we propose the Graph EDM framework, which generalizes popular algorithms for semi-supervised learning (e.g. Graph Sage, GCN, GAT), and unsupervised learning (e.g. Deep Walk, node2vec) of graph representations into a single consistent approach. To illustrate the generality of Graph EDM, we fit over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area.
Researcher Affiliation Collaboration Ines Chami EMAIL Stanford University Stanford, CA, 94305, USA Sami Abu-El-Haija EMAIL USC Information Sciences Institute Marina Del Rey, CA, 90292, USA Bryan Perozzi EMAIL Google Research New York, NY, 10011, USA Christopher R e EMAIL Stanford University Stanford, CA, 94305, USA Kevin Murphy EMAIL Google Research Mountain View, CA, 94043, USA
Pseudocode No The paper describes methods and frameworks for machine learning on graphs using prose and mathematical equations but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release an open-source library for GRL which includes state-of-the-art GRL methods and important graph applications, including node classification and link prediction. Our implementation is publicly available at https://github.com/google/ gcnn-survey-paper.
Open Datasets Yes Recently, there has been progress in this direction, including new graph benchmarks with leaderboards (Hu et al., 2020; Dwivedi et al., 2020) and graph embedding libraries (Fey and Lenssen, 2019; Wang et al., 2019; Goyal and Ferrara, 2018a).
Dataset Splits No The paper is a survey and does not report original experimental results requiring specific dataset splits. It discusses the importance of consistent evaluation, noting that 'citation benchmarks have drawbacks since results might significantly vary based on datasets splits, or training procedures (e.g. early stopping), as shown in recent work (Shchur et al., 2018).'
Hardware Specification No The paper is a survey and does not report original experimental results, therefore no specific hardware is mentioned for its own experiments. It includes general discussions about scaling to large graphs and the computational resources required for them (e.g., 'Distributed Systems setup with many machines, such as Map Reduce').
Software Dependencies No The paper describes a taxonomy and framework for graph machine learning methods and refers to an open-source library they released. However, it does not explicitly list specific software dependencies with version numbers for reproducing its own work within the text.
Experiment Setup No The paper is a survey and taxonomy, and as such, it does not present original experimental results that would require a detailed experimental setup or hyperparameter specifications for reproduction. It discusses hyperparameters in the context of existing methods, such as 'Deep Walk and node2vec sampling strategies use hyper-parameters to control this, such as the length of the walk or ratio between breadth and depth exploration'.