Are Random Forests Truly the Best Classifiers?
Authors: Michael Wainberg, Babak Alipanahi, Brendan J. Frey
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this response, we show that the study s results are biased by the lack of a held-out test set and the exclusion of trials with errors. Further, the study s own statistical tests indicate that random forests do not have significantly higher percent accuracy than support vector machines and neural networks, calling into question the conclusion that random forests are the best classifiers. ... We re-evaluated the mean percent accuracy of the top 8 classifiers on only the benchmarks successfully run by all 8, and found that a neural network, elm kernel matlab, was competitive with random forests (Table 1), even having the highest mean accuracy (albeit by a very small, insignificant, margin). |
| Researcher Affiliation | Collaboration | Michael Wainberg EMAIL Department of Electrical and Computer Engineering University of Toronto, Toronto, ON M5S 3G4, Canada; Deep Genomics, Toronto, ON M5G 1L7, Canada |
| Pseudocode | No | The paper describes methods and results in prose and tables but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce the three tables and calculate the sum of positive and negative accuracy differences between pairs of classifiers is available as a supplement to this paper. |
| Open Datasets | Yes | The JMLR study Do we need hundreds of classifiers to solve real world classification problems? benchmarks 179 classifiers in 17 families on 121 data sets from the UCI repository and claims that the random forest is clearly the best family of classifier . ... Partitions are available at http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz. |
| Dataset Splits | Yes | One training and one test set are generated randomly (each with 50% of the available patterns) [...]. This couple of sets is used only for parameter tuning (in those classifiers which have tunable parameters), selecting the parameter values which provide the best accuracy on the test set. [...] Then, using the selected values for the tunable parameters, a 4-fold cross validation is developed using the whole available data. [...] The test results is the average over the 4 test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments or re-evaluations. |
| Software Dependencies | No | The paper mentions software like 'elm kernel matlab', 'rf caret', 'par RF caret', 'svm C', etc., but does not provide specific version numbers for any of these components. |
| Experiment Setup | Yes | One training and one test set are generated randomly (each with 50% of the available patterns) [...]. This couple of sets is used only for parameter tuning (in those classifiers which have tunable parameters), selecting the parameter values which provide the best accuracy on the test set. ... par RF t uses a grid search of 2 to 8 in steps of 2; rf t searches from 2 to 29 in steps of 3, and rforest R sets mtry = #features |