reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Shedding Light on Problems with Hyperbolic Graph Learning

Authors: Isay Katsman, Anna Gilbert

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On several of the original most hyperbolic test tasks, we demonstrate that a simple Euclidean model outperforms or matches a variety of state-of-the-art hyperbolic (and non-trivial Euclidean) models. We perform an analysis of existing methods, and introduce a parametric family of benchmark datasets that help establish the applicability of (hyperbolic) graph neural networks. Table 1: Above we give graph task results for a debugged version of the Euclidean model from Chami et al. (2019) compared to other models, including the current state-of-the-art models from Chen et al. (2021) and Katsman et al. (2023). The metrics reported are test ROC AUC for link prediction and test F1 score for node classification.
Researcher Affiliation	Academia	Isay Katsman EMAIL Department of Applied Mathematics Yale University Anna Gilbert EMAIL Department of Applied Mathematics Department of Statistics & Data Science Department of Electrical Engineering Yale University
Pseudocode	No	The paper includes mathematical definitions and descriptions of methods, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	A complete repository with our code (and a bug-free version of the Euclidean multi-layer perceptron model from Chami et al. (2019)) is available at the following Github link. There, our README gives explicit line-by-line commands to reproduce our results.
Open Datasets	Yes	We demonstrate this specifically for the link prediction and node classification tasks presented originally in Chami et al. (2019), and used in a number of papers thereafter (Chen et al., 2021; Zhang et al., 2019; 2021). Dataset Disease Disease-M Airport Hyperbolicity δ = 0 δ = 0 δ = 1. Dataset WN18RR (Bordes et al., 2013)
Dataset Splits	No	The paper mentions using "test ROC AUC for link prediction" and "test F1 score for node classification" and states that results are given over "5 trials", implying the existence of dataset splits. However, it does not explicitly state the specific percentages, sample counts, or methodology for these splits within the main text or appendices.
Hardware Specification	Yes	Our runs for Table 1, specifically the MLP (debugged) row, were performed on a single NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	The hyperparameters for our best runs were found with tuning via the Weights and Biases Biewald (2020) framework and we include the YAML configuration file we used for tuning in the repository as well. While Weights and Biases is mentioned, no specific version number for this or any other key software library/framework is provided in the paper.
Experiment Setup	No	The hyperparameters for our best runs were found with tuning via the Weights and Biases Biewald (2020) framework and we include the YAML configuration file we used for tuning in the repository as well. While it mentions that hyperparameters were tuned and that a configuration file is available in a repository, the actual concrete values for these hyperparameters are not specified within the paper itself.