reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hyperbolic Random Forests

Authors: Lars Doorenbos, Pablo Márquez Neila, Raphael Sznitman, Pascal Mettes

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on standard and new benchmarks show that our approach outperforms both conventional random forest algorithms and recent hyperbolic classifiers. We extensively evaluate our approach on canonical benchmarks from previous works on hyperbolic classification. Additionally, we introduce three new multi-class classification experiments. We show that our method is superior to both competing hyperbolic classifiers as well as their Euclidean counterparts in these settings.
Researcher Affiliation	Academia	Lars Doorenbos EMAIL University of Bern Pablo Márquez-Neila EMAIL University of Bern Raphael Sznitman EMAIL University of Bern Pascal Mettes EMAIL University of Amsterdam
Pseudocode	No	The paper describes algorithms and methods using mathematical formulas and textual descriptions, but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks, nor does it present structured code-like steps.
Open Source Code	No	The paper mentions its review on Open Review and uses third-party tools like Horo SVM, but it does not provide an explicit statement or a link to its own implementation code for the described methodology.
Open Datasets	Yes	We evaluate Horo RF on two canonical hyperbolic classification benchmarks: Word Net subtree and network node classification (Cho et al., 2019; Fan et al., 2023). We run binary classification experiments on the nouns of Word Net (Fellbaum, 2010), embedded into hyperbolic space using hyperbolic entailment cones (Ganea et al., 2018a). We run node classification experiments on four network datasets embedded into hyperbolic space. These networks are (1) karate (Zachary, 1977), (2) polblogs (Adamic & Glance, 2005), (3) football (Girvan & Newman, 2002), and (4) polbooks1, a network of 105 books split into three affiliations. 1http://www-personal.umich.edu/ mejn/netdata/. We evaluate the methods using CIFAR10 Krizhevsky et al. (2009) and STL10 Coates et al. (2011).
Dataset Splits	Yes	The datasets are split into train, validation, and test sets with a ratio of 60:20:20. Following (Fan et al., 2023), we split the data into 80% for training and 20% for testing and report the best results from a grid search over three runs in AUPR. Following previous work (Fan et al., 2023), we run a grid search with 5-fold stratified cross-validation and report the best average micro-f1 score over five seeds, using a different embedding for each seed.
Hardware Specification	No	The paper discusses the runtime and computational time needed for Horo RF but does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper refers to using specific methods like Horo SVM and mentions general concepts like Gini impurity, but it does not provide version numbers for any specific software libraries, programming languages, or solvers used in the implementation.
Experiment Setup	Yes	We use 100 trees for all tree-based methods. We use the Gini impurity to calculate the information gain. At every node, we set C to 2n where n is randomly sampled from { 3, 2, . . . , 5}. We stop splitting when a node reaches a certain number of samples m. In the case of ties in information gain between possible splits, a random split is selected. If the Horo Splitter fails to converge, we choose the best horosphere from a small number of random ideal points. We set m and the class-balancing hyperparameter β via grid search. A hyperparameter search is done on the validation set, and the macro-f1 score over three trials on the test set is reported.