Random Rotation Ensembles

Authors: Rico Blaser, Piotr Fryzlewicz

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce a method that is simple to implement yet general and effective in improving ensemble diversity with only modest impact on the accuracy of the individual base learners. By randomly rotating the feature space prior to inducing the base learners, we achieve favorable aggregate predictions on standard data sets compared to state of the art ensemble methods, most notably for tree-based ensembles, which are particularly sensitive to rotation. Keywords: feature rotation, ensemble diversity, smooth decision boundary
Researcher Affiliation Academia Rico Blaser EMAIL Piotr Fryzlewicz EMAIL Department of Statistics London School of Economics Houghton Street London, WC2A 2AE, UK
Pseudocode Yes The necessary modifications are illustrated in pseudo code in Listing 1 below. All methods tested use classification or regression trees that divide the predictor space into disjoint regions Gj, where 1 j J, with J denoting the total number of terminal nodes of the tree. Extending the notation in Hastie et al. (2009), we represent a tree as
Open Source Code Yes For this reason, we provide random rotation code in C/C++ and R in Appendix A, which can be used as a basis for enhancing existing software packages.
Open Datasets Yes For our comparative study of random rotation, we selected UCI data sets (Bache and Lichman, 2013) that are commonly used in the machine learning literature in order to make the results easier to interpret and compare. Table 5 in Appendix C summarizes the data sets, including relevant dimensional information.
Dataset Splits Yes For each experiment we performed a random 70-30 split of the data; 70% training data and the remaining 30% served as testing data. The split was performed uniformly at random but enforcing the constraint that at least one observation of each category level had to be present in the training data for categorical variables. This constraint was necessary to avoid situations, where the testing data contained category levels that were absent in the training set. Experiments were repeated 100 times (with different random splits) and the average performance was recorded.
Hardware Specification Yes The C++ code takes less than 0.5 seconds on a single core of an Intel Xeon E5-2690 CPU to generate a 1000x1000 random rotation matrix.
Software Dependencies Yes It uses the Eigen template library (Guennebaud et al., 2010) and a Mersenne Twister (Matsumoto and Nishimura, 1998) pseudorandom number generator.
Experiment Setup Yes In all cases we used default parameters for the tree induction algorithms, except that we built 5000 trees for each ensemble in the hope of achieving full convergence. To evaluate the performance of random rotations, we ranked each method for each data set and computed the average rank across all data sets.