Plug-and-Play Dual-Tree Algorithm Runtime Analysis

Authors: Ryan R. Curtin, Dongryeol Lee, William B. March, Parikshit Ram

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental However, for most real-world datasets with the cover tree implementation in mlpack (Curtin et al., 2013a) and the reference implementation (Beygelzimer et al., 2006), the tree imbalance is near-linear with the number of points. We have constructed cover trees on N uniformly subsampled points from a variety of datasets and calculated the imbalance; see Table 2 for the results. Ten trials were performed for each dataset and each N, and the mean imbalance is given.
Researcher Affiliation Collaboration Ryan R. Curtin EMAIL School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332-0250, USA Dongryeol Lee EMAIL Yahoo Labs Sunnyvale, CA 94089 William B. March EMAIL Institute for Computational Engineering and Sciences University of Texas, Austin Austin, TX 78712-1229 Parikshit Ram EMAIL Skytree, Inc. Atlanta, GA 30332
Pseudocode Yes Algorithm 1 The standard pruning dual-tree traversal for cover trees. Algorithm 2 Nearest neighbor search Base Case() Algorithm 3 Nearest neighbor search Score()
Open Source Code No The text mentions 'the cover tree implementation in mlpack (Curtin et al., 2013a)', but does not explicitly state that the code for the methodology described in *this* paper is being released or provide a link to it.
Open Datasets Yes The power, susy, higgs, and covertype datasets are found in the UCI Machine Learning Repository (Bache and Lichman, 2013), the mnist dataset is from Le Cun et al. (2000), the lcdm and sdss datasets are Sloan Digital Sky Survey data (Adelman-Mc Carthy et al., 2008), and the randu dataset is randomly-generated uniformly-distributed data in 10 dimensions.
Dataset Splits No The paper states, 'We have constructed cover trees on N uniformly subsampled points from a variety of datasets,' but does not provide specific training, test, or validation dataset splits for experimental evaluation.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments or empirical calculations.
Software Dependencies No The paper mentions 'the cover tree implementation in mlpack (Curtin et al., 2013a) and the reference implementation (Beygelzimer et al., 2006)', but it does not specify any software names with version numbers for reproducibility.
Experiment Setup No The paper describes constructing cover trees on 'N uniformly subsampled points' and performing 'Ten trials' to calculate mean tree imbalance, but it does not provide specific experimental setup details like hyperparameters or training configurations for a machine learning model.