Are Random Forests Truly the Best Classifiers?

Authors: Michael Wainberg, Babak Alipanahi, Brendan J. Frey

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this response, we show that the study s results are biased by the lack of a held-out test set and the exclusion of trials with errors. Further, the study s own statistical tests indicate that random forests do not have significantly higher percent accuracy than support vector machines and neural networks, calling into question the conclusion that random forests are the best classifiers. ... We re-evaluated the mean percent accuracy of the top 8 classifiers on only the benchmarks successfully run by all 8, and found that a neural network, elm kernel matlab, was competitive with random forests (Table 1), even having the highest mean accuracy (albeit by a very small, insignificant, margin).
Researcher Affiliation Collaboration Michael Wainberg EMAIL Department of Electrical and Computer Engineering University of Toronto, Toronto, ON M5S 3G4, Canada; Deep Genomics, Toronto, ON M5G 1L7, Canada
Pseudocode No The paper describes methods and results in prose and tables but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce the three tables and calculate the sum of positive and negative accuracy differences between pairs of classifiers is available as a supplement to this paper.
Open Datasets Yes The JMLR study Do we need hundreds of classifiers to solve real world classification problems? benchmarks 179 classifiers in 17 families on 121 data sets from the UCI repository and claims that the random forest is clearly the best family of classifier . ... Partitions are available at http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz.
Dataset Splits Yes One training and one test set are generated randomly (each with 50% of the available patterns) [...]. This couple of sets is used only for parameter tuning (in those classifiers which have tunable parameters), selecting the parameter values which provide the best accuracy on the test set. [...] Then, using the selected values for the tunable parameters, a 4-fold cross validation is developed using the whole available data. [...] The test results is the average over the 4 test sets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments or re-evaluations.
Software Dependencies No The paper mentions software like 'elm kernel matlab', 'rf caret', 'par RF caret', 'svm C', etc., but does not provide specific version numbers for any of these components.
Experiment Setup Yes One training and one test set are generated randomly (each with 50% of the available patterns) [...]. This couple of sets is used only for parameter tuning (in those classifiers which have tunable parameters), selecting the parameter values which provide the best accuracy on the test set. ... par RF t uses a grid search of 2 to 8 in steps of 2; rf t searches from 2 to 29 in steps of 3, and rforest R sets mtry = #features