reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Are Random Forests Truly the Best Classifiers?

Authors: Michael Wainberg, Babak Alipanahi, Brendan J. Frey

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this response, we show that the study s results are biased by the lack of a held-out test set and the exclusion of trials with errors. Further, the study s own statistical tests indicate that random forests do not have signiﬁcantly higher percent accuracy than support vector machines and neural networks, calling into question the conclusion that random forests are the best classiﬁers. ... We re-evaluated the mean percent accuracy of the top 8 classiﬁers on only the benchmarks successfully run by all 8, and found that a neural network, elm kernel matlab, was competitive with random forests (Table 1), even having the highest mean accuracy (albeit by a very small, insigniﬁcant, margin).
Researcher Affiliation	Collaboration	Michael Wainberg EMAIL Department of Electrical and Computer Engineering University of Toronto, Toronto, ON M5S 3G4, Canada; Deep Genomics, Toronto, ON M5G 1L7, Canada
Pseudocode	No	The paper describes methods and results in prose and tables but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce the three tables and calculate the sum of positive and negative accuracy diﬀerences between pairs of classiﬁers is available as a supplement to this paper.
Open Datasets	Yes	The JMLR study Do we need hundreds of classiﬁers to solve real world classiﬁcation problems? benchmarks 179 classiﬁers in 17 families on 121 data sets from the UCI repository and claims that the random forest is clearly the best family of classiﬁer . ... Partitions are available at http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz.
Dataset Splits	Yes	One training and one test set are generated randomly (each with 50% of the available patterns) [...]. This couple of sets is used only for parameter tuning (in those classiﬁers which have tunable parameters), selecting the parameter values which provide the best accuracy on the test set. [...] Then, using the selected values for the tunable parameters, a 4-fold cross validation is developed using the whole available data. [...] The test results is the average over the 4 test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments or re-evaluations.
Software Dependencies	No	The paper mentions software like 'elm kernel matlab', 'rf caret', 'par RF caret', 'svm C', etc., but does not provide specific version numbers for any of these components.
Experiment Setup	Yes	One training and one test set are generated randomly (each with 50% of the available patterns) [...]. This couple of sets is used only for parameter tuning (in those classiﬁers which have tunable parameters), selecting the parameter values which provide the best accuracy on the test set. ... par RF t uses a grid search of 2 to 8 in steps of 2; rf t searches from 2 to 29 in steps of 3, and rforest R sets mtry = #features