RaSE: Random Subspace Ensemble Classification

Authors: Ye Tian, Yang Feng

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An array of simulations under various models and real-data applications demonstrate the effectiveness and robustness of the Ra SE classifier and its iterative version in terms of low misclassification rate and accurate feature ranking.
Researcher Affiliation Academia Ye Tian EMAIL Department of Statistics Columbia University New York, NY 10027, USA Yang Feng EMAIL Department of Biostatistics, School of Global Public Health New York University New York, NY 10003, USA
Pseudocode Yes Algorithm 1: Random subspace ensemble classification (Ra SE) Algorithm 2: Iterative Ra SE (Ra SET )
Open Source Code Yes The Ra SE algorithm is implemented in the R package Ra SEn on CRAN. 6.1 Summary In this work, we introduce a flexible ensemble classification framework named Ra SE, which is designed to solve the sparse classification problem. To select the optimal subspace for each weak learner, we define a new information criterion, ratio information criterion (RIC), based on Kullback-Leibler divergence, and it is shown to achieve screening consistency and weak consistency under some general model conditions. This guarantees that each weak learner is trained using features in the minimal discriminative set with a high probability for a sufficiently large sample size and the number of random subspaces. We also investigate the consistency of RIC for LDA and QDA models under some specific conditions. The theoretical analysis of Ra SE classifiers with specific base classifiers is conducted. In addition, Tian and Feng we present two versions of Ra SE algorithms, that is, the vanilla version (Ra SE) and the iterative version (Ra SET ). We also apply Ra SE for feature ranking based on the average selected percentage of features in subspaces. Theoretical analysis shows that when the stepwise detectable condition holds and the signal is sufficiently sparse, the iterative Ra SE can cover the minimal discriminative set with high probability after a few iterations, with the required number of random subspaces smaller than that for the vanilla Ra SE. Multiple numerical experiments, including simulations and real data, verify that Ra SE is a favorable classification method for sparse classification problems. The Ra SE algorithms are available in R package Ra SEn (https://cran.r-project. org/web/packages/Ra SEn/index.html).
Open Datasets Yes The madelon data set (http://archive.ics.uci.edu/ml/datasets/madelon) is an artificial data set containing data points grouped in 32 clusters placed on the vertices of a five-dimensional hypercube and randomly labeled as 0 or 1 (Guyon et al., 2005). The musk data set (https://archive.ics.uci.edu/ml/datasets/Musk+(Version+2)) contains 6598 observations with 5581 non-musk (class 0) and 1017 musk (class 1) molecules. The mice protein expression data set (https://archive.ics.uci.edu/ml/datasets/Mice+ Protein+Expression) contains 1080 instances with 570 healthy mice (class 0) and 510 mice Tian and Feng with Down s syndrome (class 1). There are 77 features representing the expression of 77 different proteins (Higuera et al., 2015). The hand-written digits recognition data set (https://archive.ics.uci.edu/ml/datasets/ Multiple+Features) consists of features of hand-written numerals (0-9) extracted from a collection of Dutch utility maps (Dua and Graff, 2019).
Dataset Splits Yes Here p = 400, and the training sample size n {200, 400, 1000}. Test data of size 1000 is independently generated from the same model. The training sample size is set as n {200, 500, 1000} for three different settings, and each time the remained data is used as test data. The training sample size is set to be 200, 500, 1000, for each setting, and the remaining observations are used as the test data. Training samples of size 200, 500, 800 are considered, and the remaining observations are set as the test data. There are 400 observations, 200 of which belong to class 0, and the remaining 200 belong to class 1. The training samples of size 50, 100, 200 are used, and the remained data is used as the test data.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It only discusses computational complexity.
Software Dependencies No The standard LDA and QDA methods are implemented by using R package MASS. And the k NN classifier is implemented by knn, knn.cv in R package class and function knn3 in R package caret. We utilize package dsda to fit s LDA model. RAMP is implemented through package RAMP. For the RF, we use R package Random Forest; the number of trees are set as 500 (as default) and [ p] variables are randomly selected when training each tree (as default). And the NSC model is fitted by calling function pamr.train in package pamr. RPEnsemble is implemented by R package RPEnsemble. To obtain the MLE of parameters in the Gamma distribution, the Newton s method is applied via function nlm in package stats and the initial point is chosen to be the moment estimator. To get the non-parametric estimate of KL divergence and RIC, we call function KL.divergence in package FNN. The paper lists several R packages used but does not specify their version numbers.
Experiment Setup Yes In our implementation, we set B1 = 200 and B2 = 500 as default. For LDA and QDA classifier, C is set to choose the optimal subspace by minimizing the RIC, while for k NN, the default setting is minimizing the leave-one-out cross-validation error. Without prior information about the features, as we mentioned in Section 2.1, D is set as the hierarchical uniform distribution over the subspaces. To generate the size d of subspaces from the uniform distribution over {1, . . . , D}, another parameter D has to be determined. In practice, for QDA base classifier we set D = min(p, [ n0], [ n1]) and for LDA, k NN and all the other base classifiers, we set D = min(p, [ n]), where [a] denotes the largest integer not larger than a. The threshold ˆα is chosen by (2) to minimize the training error. When using non-parametric estimate of RIC corresponding to (6), following Wang et al. (2009) and Ganguly et al. (2018), we set k0 = [ n0] and k1 = [ n1] to satisfy the conditions they presented for proving the consistency. For the RF, we use R package Random Forest; the number of trees are set as 500 (as default) and [ p] variables are randomly selected when training each tree (as default). When fitting the standard k NN classifier, and the k NN base classifier in RPEnsemble and Ra SE method, the number of neighbors k is chosen from {3, 5, 7, 9, 11} via leave-one-out cross-validation, following Cannings and Samworth (2017).