reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Plug-and-Play Dual-Tree Algorithm Runtime Analysis

Authors: Ryan R. Curtin, Dongryeol Lee, William B. March, Parikshit Ram

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	However, for most real-world datasets with the cover tree implementation in mlpack (Curtin et al., 2013a) and the reference implementation (Beygelzimer et al., 2006), the tree imbalance is near-linear with the number of points. We have constructed cover trees on N uniformly subsampled points from a variety of datasets and calculated the imbalance; see Table 2 for the results. Ten trials were performed for each dataset and each N, and the mean imbalance is given.
Researcher Affiliation	Collaboration	Ryan R. Curtin EMAIL School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332-0250, USA Dongryeol Lee EMAIL Yahoo Labs Sunnyvale, CA 94089 William B. March EMAIL Institute for Computational Engineering and Sciences University of Texas, Austin Austin, TX 78712-1229 Parikshit Ram EMAIL Skytree, Inc. Atlanta, GA 30332
Pseudocode	Yes	Algorithm 1 The standard pruning dual-tree traversal for cover trees. Algorithm 2 Nearest neighbor search Base Case() Algorithm 3 Nearest neighbor search Score()
Open Source Code	No	The text mentions 'the cover tree implementation in mlpack (Curtin et al., 2013a)', but does not explicitly state that the code for the methodology described in this paper is being released or provide a link to it.
Open Datasets	Yes	The power, susy, higgs, and covertype datasets are found in the UCI Machine Learning Repository (Bache and Lichman, 2013), the mnist dataset is from Le Cun et al. (2000), the lcdm and sdss datasets are Sloan Digital Sky Survey data (Adelman-Mc Carthy et al., 2008), and the randu dataset is randomly-generated uniformly-distributed data in 10 dimensions.
Dataset Splits	No	The paper states, 'We have constructed cover trees on N uniformly subsampled points from a variety of datasets,' but does not provide specific training, test, or validation dataset splits for experimental evaluation.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments or empirical calculations.
Software Dependencies	No	The paper mentions 'the cover tree implementation in mlpack (Curtin et al., 2013a) and the reference implementation (Beygelzimer et al., 2006)', but it does not specify any software names with version numbers for reproducibility.
Experiment Setup	No	The paper describes constructing cover trees on 'N uniformly subsampled points' and performing 'Ten trials' to calculate mean tree imbalance, but it does not provide specific experimental setup details like hyperparameters or training configurations for a machine learning model.