reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tunability: Importance of Hyperparameters of Machine Learning Algorithms

Authors: Philipp Probst, Anne-Laure Boulesteix, Bernd Bischl

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the Open ML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to choose adequate hyperparameter spaces for tuning.
Researcher Affiliation	Academia	Philipp Probst EMAIL Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich Marchioninistr. 15, 81377 München, Germany Anne-Laure Boulesteix EMAIL Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich Marchioninistr. 15, 81377 München, Germany Bernd Bischl EMAIL Department of Statistics, LMU Munich Ludwigstraße 33, 80539 München, Germany
Pseudocode	No	The paper describes methods and procedures in prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The fully reproducible R code for all computations and analyses of our paper can be found on the github page: https://github.com/Philipp Pro/tunability.
Open Datasets	Yes	We use a speciﬁc subset of carefully curated classiﬁcation datasets from the Open ML platform called Open ML100 (Bischl et al., 2017a). For our study we only use the 38 binary classiﬁcation tasks that do not contain any missing values.
Dataset Splits	Yes	The performance estimation for the diﬀerent hyperparameter experiments is computed through 10-fold cross-validation. For the comparison of surrogate models 10 times repeated 10-fold cross-validation is used.
Hardware Specification	No	The paper discusses software tools and parallelization but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	All our experiments are executed in R and are run through a combination of custom code from our random bot (Kühn et al., 2018b), the Open ML R package (Casalicchio et al., 2017), mlr (Bischl et al., 2016) and batchtools (Lang et al., 2017) for parallelization. All results are uploaded to the Open ML platform and there publicly available for further analysis. mlr is also used to compare and ﬁt all surrogate regression models.
Experiment Setup	Yes	The algorithms considered in this paper are common methods for supervised learning. We examine elastic net (glmnet R package), decision tree (rpart), k-nearest neighbors (kknn), support vector machine (svm), random forest (ranger) and gradient boosting (xgboost). For more details about the used software packages see Kühn et al. (2018b). An overview of their considered hyperparameters is displayed in Table 1, including respective data types, box-constraints and a potential transformation function. [...] We sample these points from independent uniform distributions where the respective support for each parameter is displayed in Table 1. [...] For the estimation of the defaults for each algorithm we randomly sample 100000 points in the hyperparameter space as deﬁned in Table 1 and determine the conﬁguration with the minimal average risk. The same strategy with 100000 random points is used to obtain the best hyperparameter setting on each dataset that is needed for the estimation of the tunability of an algorithm. For the estimation of the tunability of single hyperparameters we also use 100000 random points for each parameter, while for the tunability of combination of hyperparameters we only use 10000 random points to reduce runtime as this should be enough to cover 2-dimensional hyperparameter spaces.