reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

Authors: Abraham J. Wyner, Matthew Olson, Justin Bleich, David Mease

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a number of examples to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classiﬁcation require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees, without regularization or early stopping. We will begin with an easy to visualize example that demonstrates how ﬁne interpolation can provide robustness in a noisy setting. In particular, we compare the performance of Ada Boost, random forests and one-nearest neighbors, which are all interpolating classiﬁers. We will show that the self-averaging property of Ada Boost and random forests is crucial. This property will be discussed in subsequent sections. The implementation of Ada Boost is carried out according to the algorithm described earlier. The base learners used are trees ﬁt by the rpart package (Therneau and Atkinson, 1997) in R. The trees are grown to a maximum depth of 8, meaning they may have at most 28 = 256 terminal nodes. This will be the implementation of Ada Boost we will consider throughout the remainder of this paper. We will consider again the pure noise model as described in the previous section, where the probability that y is equal to +1 for every x is some constant value p > .5. For the training data we will take n = 400 points uniformly sampled on [0, 1]2 according to a Latin Hypercube using the midpoints as before. For the corresponding y values in training data we will randomly choose 80 points to be 1 s so that P (y = 1\|x) = .8. Figure 5 displays the results for the following: (a) one-NN, (b) Ada Boost , and (c) random forests. We now repeat the simulation in Section 3.3 with a larger sample size and in 20 dimensions instead of 2. Speciﬁcally, the training data now has n = 5000 observations sampled according to the midpoints of a Latin Hypercube design uniformly on [0, 1]20. We again randomly select 20% or 1000 of these points to be 1 s with the remaining 4000 to be +1 s. Since in 20 dimensions it is diﬃcult to display the resulting classiﬁcation rules graphically we instead examine the rules on a hold out sample of 10,000 points sampled uniformly and independently on [0, 1]20. Figure 8 plots the proportion of points in the hold out sample classiﬁed by Ada Boost as +1 as a function of the number of iterations. We repeated a version of the analysis conducted in Section 5.1 for ﬁve additional data sets from the UCI repository: Haberman, Wisconsin breast cancer, voting, Pima, and German credit. In our analysis we did the following: we added 5% label noise to the training data, missing values were imputed with the mean prior to model ﬁtting, and all experiments were conducted on 50 random training-testing splits of each data set.
Researcher Affiliation	Collaboration	Abraham J. Wyner EMAIL Matthew Olson EMAIL Justin Bleich EMAIL Department of Statistics Wharton School, University of Pennsylvania Philadelphia, PA 19104, USA David Mease EMAIL Apple Inc.
Pseudocode	Yes	Algorithm 1: Ada Boost Hastie et al. (2009) 1. Initialize the observation weights wi = 1 N , i = 1, 2, . . . , N. 2. For m = 1 to M: (a) Fit a classiﬁer Gm(x) to the training data using weights wi. (b) Compute errm = PN i=1 wi I (yi = Gt (xi)) PN i=1 wi . (c) Compute am = log 1 errt (d) Set wi wi exp (at I (yi = Gt (xi))) (e) Set fi(x) = PM m=1 am Gm (x) 3. Output f(x) = sign (f M(x)) ... Algorithm 2: Random Forests Hastie et al. (2009) 1. For b = 1 to B: (a) Draw a bootstrap sample X of size N from the training data (b) Grow a decision tree Tb to the data X by doing the following recursively until the minimum node size nmin is reached: i. Select m of the p variables ii. Pick the best variable/split-point from the m variables and partition 2. Output the ensemble {Tb}B b
Open Source Code	No	The paper mentions using the 'rpart package (Therneau and Atkinson, 1997) in R' and 'R implementation random Forest Liaw and Wiener (2002)'. These are third-party tools that were utilized. The authors do not provide a specific link or explicit statement about releasing their own source code for the methodology described in this paper.
Open Datasets	Yes	The original data can be found at the following address: https://www.elen.ucl.ac.be/neuralnets/Research/Projects/ELENA/databases/REAL/phoneme/phoneme.txt. We repeated a version of the analysis conducted in Section 5.1 for ﬁve additional data sets from the UCI repository: Haberman, Wisconsin breast cancer, voting, Pima, and German credit.
Dataset Splits	Yes	We begin by dividing the phoneme data set up into a training set consisting of 70% of the samples, and a testing set consisting of 30% of the examples. Speciﬁcally, the training data now has n = 5000 observations sampled according to the midpoints of a Latin Hypercube design uniformly on [0, 1]20. We again randomly select 20% or 1000 of these points to be 1 s with the remaining 4000 to be +1 s. Since in 20 dimensions it is diﬃcult to display the resulting classiﬁcation rules graphically we instead examine the rules on a hold out sample of 10,000 points sampled uniformly and independently on [0, 1]20.
Hardware Specification	No	The paper describes various simulations and analyses, including the use of R packages and decision trees, but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used to run these experiments.
Software Dependencies	No	The paper mentions the use of 'rpart package (Therneau and Atkinson, 1997) in R' and 'R implementation random Forest Liaw and Wiener (2002)'. While it names the software and programming language, it does not provide specific version numbers for R, the rpart package, or the randomForest package, which are necessary for a reproducible description of ancillary software.
Experiment Setup	Yes	The base learners used are trees ﬁt by the rpart package (Therneau and Atkinson, 1997) in R. The trees are grown to a maximum depth of 8, meaning they may have at most 2^8 = 256 terminal nodes. This will be the implementation of Ada Boost we will consider throughout the remainder of this paper. We run Ada Boost for 500 iterations, ﬁt a random forests model with 500 trees, and build a CART tree that is pruned using cross-validation. We repeated a version of the analysis conducted in Section 5.1 for ﬁve additional data sets from the UCI repository: Haberman, Wisconsin breast cancer, voting, Pima, and German credit. In our analysis we did the following: we added 5% label noise to the training data, missing values were imputed with the mean prior to model ﬁtting, and all experiments were conducted on 50 random training-testing splits of each data set.