Plug-and-Play Dual-Tree Algorithm Runtime Analysis
Authors: Ryan R. Curtin, Dongryeol Lee, William B. March, Parikshit Ram
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | However, for most real-world datasets with the cover tree implementation in mlpack (Curtin et al., 2013a) and the reference implementation (Beygelzimer et al., 2006), the tree imbalance is near-linear with the number of points. We have constructed cover trees on N uniformly subsampled points from a variety of datasets and calculated the imbalance; see Table 2 for the results. Ten trials were performed for each dataset and each N, and the mean imbalance is given. |
| Researcher Affiliation | Collaboration | Ryan R. Curtin EMAIL School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332-0250, USA Dongryeol Lee EMAIL Yahoo Labs Sunnyvale, CA 94089 William B. March EMAIL Institute for Computational Engineering and Sciences University of Texas, Austin Austin, TX 78712-1229 Parikshit Ram EMAIL Skytree, Inc. Atlanta, GA 30332 |
| Pseudocode | Yes | Algorithm 1 The standard pruning dual-tree traversal for cover trees. Algorithm 2 Nearest neighbor search Base Case() Algorithm 3 Nearest neighbor search Score() |
| Open Source Code | No | The text mentions 'the cover tree implementation in mlpack (Curtin et al., 2013a)', but does not explicitly state that the code for the methodology described in *this* paper is being released or provide a link to it. |
| Open Datasets | Yes | The power, susy, higgs, and covertype datasets are found in the UCI Machine Learning Repository (Bache and Lichman, 2013), the mnist dataset is from Le Cun et al. (2000), the lcdm and sdss datasets are Sloan Digital Sky Survey data (Adelman-Mc Carthy et al., 2008), and the randu dataset is randomly-generated uniformly-distributed data in 10 dimensions. |
| Dataset Splits | No | The paper states, 'We have constructed cover trees on N uniformly subsampled points from a variety of datasets,' but does not provide specific training, test, or validation dataset splits for experimental evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments or empirical calculations. |
| Software Dependencies | No | The paper mentions 'the cover tree implementation in mlpack (Curtin et al., 2013a) and the reference implementation (Beygelzimer et al., 2006)', but it does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | No | The paper describes constructing cover trees on 'N uniformly subsampled points' and performing 'Ten trials' to calculate mean tree imbalance, but it does not provide specific experimental setup details like hyperparameters or training configurations for a machine learning model. |