reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Relating graph auto-encoders to linear models

Authors: Solveig Klepper, Ulrike von Luxburg

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our work, we prove that the solution space induced by graph auto-encoders is a subset of the solution space of a linear map. This demonstrates that linear embedding models have at least the representational power of graph auto-encoders based on graph convolutional networks. So why are we still using nonlinear graph auto-encoders? ... Our experiments are aligned with other empirical work on this question and show that the linear encoder can outperform the nonlinear encoder when using feature information.
Researcher Affiliation	Academia	Solveig Klepper EMAIL Department of Computer Science and Tübingen AI Center University of Tübingen Ulrike von Luxburg EMAIL Department of Computer Science and Tübingen AI Center University of Tübingen
Pseudocode	No	The paper describes methods using mathematical equations and text, and provides architectural diagrams (Figure 1), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code for the relu graph auto-encoder is publicly available with an MIT license.
Open Datasets	Yes	As real world datasets we consider three standard benchmarks: Cora, Citeseer, and Pubmed.
Dataset Splits	Yes	Every graph is split into train, validation and test sets with a ration of 70{10{20.
Hardware Specification	Yes	We run the experiments on an internal cluster on Intel XEON CPU E5-2650 v4 and Ge Force GTX 1080 Ti. All experiments on the synthetic dataset take about 9 hours on single CPU and single GPU. Experiments for Cora and Citeseer take about 4 and 5 hours respectively. For the Pubmed dataset, which is the largest one, running all 8 setups took about 4 days on a single CPU and two GPUs.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation of the models or experiments.
Experiment Setup	Yes	We use the same objective function as Kipf & Welling (2016b), weighting the edges to tackle the sparsity of the graph and regularizing by their mean squared norm to prevent the point embeddings from diverging. We train using gradient descent and the Adam optimizer for 200 epochs. ... Similar to previous work, we embed the nodes into 16 dimensions when considering the link prediction task. For the node prediction task, embedding into 16 dimension turns out to be too simple a task. We thus use 4 as the embedding dimension.