Intrinsic Dimension for Large-Scale Geometric Learning

Authors: Maximilian Stubbemann, Tom Hanika, Friedrich Martin Schneider

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In particular, we propose a principle way to incorporate neighborhood information, as in graph data, into the ID. This allows for new insights into common graph learning procedures, which we illustrate by experiments on the Open Graph Benchmark. We subsequently apply our method to seven real-world datasets and relate the obtained results to the observed performances of classification procedures. Thus, we demonstrate the practical computability of our approach. In addition, we study the extent to which the intrinsic dimension reveals insights into the performance of particularly classes of Graph Neural Networks.
Researcher Affiliation Academia Maximilian Stubbemann EMAIL Knowledge & Data Engineering Group, University of Kassel, Kassel, Germany Tom Hanika EMAIL Knowledge & Data Engineering Group, University of Kassel, Kassel, Germany Friedrich Martin Schneider EMAIL Institute of Discrete Mathematics and Algebra, TU Bergakademie Freiberg, Freiberg, Germany
Pseudocode Yes Algorithm 1: The pseudocode to compute (D) for a finite geometric dataset D = (X, µ, F). Input : Finite geometric dataset D = (X, µ, F). Output: (D) 1 forall f in F do 2 Compute feature sequence lf,D. 4 forall k in {2, . . . , |X|} do 5 forall f in F do 6 φk,f(D) = minj {0,...,|X| k} lf,D k+j lf,D 1+j. 7 (D)+ = maxf F φk,f(D) 8 (D) = 1 |X| (D) 9 return (D) ... Algorithm 2: The pseudocode to compute s, (D), s,+(D), (D) for a finite GD D = (X, µ, F). Input : Finite GD D = (X, µ, F), support sequence s = (2 = s1, . . . , sl = |X|), exact (Boolean) Output: s, (D), s,+(D), (D)
Open Source Code Yes Our code is publicly available on Git Hub.1 1https://github.com/mstubbemann/ID4Geo
Open Datasets Yes This allows for new insights into common graph learning procedures, which we illustrate by experiments on the Open Graph Benchmark. ... The statistics for Cora, Pub Med and Cite Seer were taken from Py Torch Geometric 2. The statistics of the OGB datasets were taken from the Open Graph Benchmark. 3 ... Pub Med, Cora and Cite Seer (Yang et al., 2016), which we retrieved from Py Torch Geometric (Fey & Lenssen, 2019). ... the well known, largescale ogbn-mag-papers100M dataset.
Dataset Splits Yes For Pub Med, Cora and Cite Seer, we train on the classification task provided by Pytorch Geometric (Fey & Lenssen, 2019) which was earlier studied by Yang et al. (2016). All Open Graph Benchmark datasets are trained and tested on the official node property prediction task.4 4https://ogb.stanford.edu/docs/nodeprop/
Hardware Specification Yes On our Xeon Gold System with 16 cores, approximating the ID of a k-hop geometric dataset build from ogbn-mag-papers100M is possible within a few hours.
Software Dependencies No For all tasks, we use a simple SIGN model Rossi et al. (2020) ... Implementation details and parameter choices can be found in Appendix A.1. ... For all models, we use an Adam optimizer with weight decay of 0.0001. ... We implement the MLE by using the Nearest Neighbors class of scikit-learn (Pedregosa et al., 2011).
Experiment Setup Yes For all tasks, we use a simple SIGN model Rossi et al. (2020) with one hidden inception layer and one classification layer. For Pub Med, Cite Seer and Cora, we use batch sizes of 256, hidden layer size of 64 and dropout at the input and hidden layer with 0.5. The learning rate is set to 0.01. ... For ogbn-arxiv, we use a hidden dimension of 512, dropout at the input with 0.1 and with 0.5 at the hidden layer. For ogbn-mag, we use a hidden dimension of 512, do not dropout at the input and use dropout with 0.5 at the hidden layer. For ogbn-products, we use a hidden dimension of 512, input dropout of 0.3 and hidden layer dropout of 0.4. For all ogbn tasks, the learning rate is 0.001 and the batch-size 50000. For all experiments, we train for a maximum of 1000 epochs with early stopping on the validation accuracy. Here, we use a patience of 15. ... For all models, we use an Adam optimizer with weight decay of 0.0001.