Sparse Projection Oblique Randomer Forests
Authors: Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Carey E. Priebe, Jason Yim, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce yet another decision forest, called Sparse Projection Oblique Randomer Forests (SPORF). SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with > 100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. Very sparse random projections can be incorporated into gradient boosted trees to obtain potentially similar gains. |
| Researcher Affiliation | Collaboration | Tyler M. Tomita EMAIL Department of Psychological and Brain Sciences Johns Hopkins University Baltimore, MD 21218, USA James Browne EMAIL Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA Cencheng Shen EMAIL Department of Applied Economics and Statistics University of Delaware Newark, DE 19716, USA Jaewon Chung EMAIL Jesse L. Patsolic EMAIL Benjamin Falk EMAIL Carey E. Priebe EMAIL Center for Imaging Science Johns Hopkins University Baltimore, MD 21218, USA Jason Yim EMAIL Deep Mind 6 Pancras Square London, UK N1C 4AG Randal Burns EMAIL Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA Mauro Maggioni EMAIL Department of Mathematics Johns Hopkins University Baltimore, MD 21218, USA Joshua T. Vogelstein EMAIL Institute for Computational Medicine Kavli Neuroscience Discovery Institute Department of Biomedical Engineering Johns Hopkins University Baltimore, MD 21218, USA |
| Pseudocode | Yes | Algorithm 1 Learning a SPORF classification tree. Input: (1) Dn = (X, y) Rn p Yn: training data (2) Θ: set of split eligibility criteria Output: A SPORF decision tree T Algorithm 2 Finding the best node split. This function is called by growtree (Alg 1) at every split node. For each of the p dimensions in X Rn p, a binary split is assessed at each location between adjacent observations. The dimension j and split value τ in j that best split the data are selected. The notion of best means maximizing some choice in scoring function. In classification, the scoring function is typically the reduction in Gini impurity or entropy. The increment function called within this function updates the counts in the left and right partitions as the split is incrementally moved to the right. |
| Open Source Code | Yes | Our statistically- and computationally-efficient parallelized implementations are available from https://neurodata.io/sporf/ in both R and Python. Our R package is available on the Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/web/packages/rerf/), and our Python package is available from Py Pi (https://pypi.org/project/rerf/2.0.5/), and is sklearn API compliant. Open source code is available at https://neurodata.io/sporf/, including both the R package discussed here, and a C++ version with both R and Python bindings that we are actively developing. |
| Open Datasets | Yes | SPORF compares favorably to RF, XGBoost, RR-RF, and CCF on a suite of 105 benchmark data sets from the UCI machine learning repository (Figure 5). MNIST The MNIST data set (Lecun et al.) has 60,000 training observations and 784 (28x28) features. Higgs The Higgs data set (https://www.kaggle.com/c/higgs-boson) has 250,000 training observations and 31 features. p53 The p53 data set (https://archive.ics.uci.edu/ml/datasets/p53+Mutants) has 31,159 training observations and 5,409 features. |
| Dataset Splits | Yes | Error rates are estimated for each algorithm for each data set via five-fold cross-validation. Error rates for each data set are reported in Appendix D. Five-fold partition. Each data set was randomly divided into five partitions for five-fold cross-validation. Partitions preserved the relative class frequencies by stratification. Each partition included a different 20% of the data for testing. The number of test points used for the Higgs, MNIST, and p53 data sets is 50,000, 10,000, and 6,000, respectively. Predictions were made sequentially without batching using a single core. |
| Hardware Specification | Yes | Experiments are run using an Intel Xeon E5-2650 v3 processors clocked at 2.30GHz with 10 physical cores, 20 threads, and 250 GB DDR4-2133 RAM. The operating system is Ubuntu 16.04. Experiments are run using four Intel Xeon E7-4860 v2 processors clocked at 2.60GHz, each processor having 12 physical cores and 24 threads. The amount of available memory is 1 TB DDR3-1600. The operating system is Ubuntu 16.04. |
| Software Dependencies | No | The paper mentions several software packages like Rcpp, Ranger, XGBoost, R caret package, R random Forest package, and MATLAB implementation, but does not provide specific version numbers for these software components. For example, it says "Rcpp package (Eddelbuettel, 2018)" but not "Rcpp 1.0.0" or similar explicit versioning. |
| Experiment Setup | Yes | Unless stated otherwise, model training and tuning for all algorithms except for XGBoost and CCF is performed in the following way. Each algorithm trains 500 trees, which was empirically determined to be sufficient for convergence of out-of-bag error for all methods. The split objective is to maximize the reduction in Gini impurity. In all methods, classification trees are fully grown unpruned (i.e. nodes are split until pure). Two hyperparameters are tuned via minimization of out-of-bag error. The first hyperparameter tuned is d, the number of candidate split directions evaluated at each split node. Each algorithm is trained for d = p1/4, p1/2, p3/4, and p. Additionally, SPORF and F-RC are trained for d = p2. For RF, d is restricted to be no greater than p by definition. The second hyperparameter tuned is λ, the average sparsity of univariate projections sampled at each split node. The values optimized over for SPORF and F-RC are {1/p, . . . , 5/p}. For CCF, the number of trees is 500, trees are fully grown, and the split objective is to maximize the reduction in class entropy (this is the default objective found to perform best by the authors). The only hyperparameter tuned is the number of features subsampled prior to performing CCA. We optimize this hyperparameter over the set {p1/4, p1/2, p3/4, p}. Five hyperparameters of XGBoost are tuned via grid search using the R caret package (see Appendix B for details). Appendix B details: nrounds: 100, 1000; subsample: 0.5, 0.75, 1; eta: 0.001, 0.01; colsample bytree: 0.4, 0.6, 0.8, 1; min child weight: 1; max depth: 4, 6, 8, 10, 100000. |