reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Empirical Evaluation of Resampling Procedures for Optimising SVM Hyperparameters

Authors: Jacques Wainer, Gavin Cawley

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents the results of an extensive empirical evaluation of resampling procedures for SVM hyperparameter selection, designed to address this gap in the machine learning literature. We tested 15 different resampling procedures on 121 binary classification data sets in order to select the best SVM hyperparameters.
Researcher Affiliation	Academia	Jacques Wainer EMAIL Computing Institute University of Campinas Campinas, SP, 13083-852, Brazil Gavin Cawley EMAIL School of Computing Sciences University of East Anglia Norwich, NR4 7TJ, U.K.
Pseudocode	No	The paper describes resampling procedures such as k-fold cross-validation and bootstrap in paragraph text, and lists the investigated procedures in Section 2.1, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The 121 data sets, the program used to run the hyperparameter search, the raw results for each resampling procedure and data set, the program to analyse the results and generate the figures in this paper are available at https://dx.doi.org/10.6084/m9.figshare.1359901.
Open Datasets	Yes	The 121 data sets used in this study were collected from the UCI repository (Lichman, 2013), processed and converted by the authors of Fern andez-Delgado et al. (2014) into a unified format.
Dataset Splits	Yes	That is, each data set is divided into two halves, the different resampling procedures are used to select the hyperparameters using the first half, the SVM is trained in this first half and its error rate evaluated for the second half. The procedure is repeated using the second half as training set and the first half as test set. The estimate of the error rate for the resampling procedure (or more precisely the error rate of the SVM with hyperparameters selected by the resampling procedure) is the average of the two measured error rates.
Hardware Specification	No	The paper mentions running experiments on 'a single core (of a multiple core machine)' and distributing data sets to 'different cores of the same machine', but does not provide specific hardware details such as CPU model, GPU type, or memory specifications.
Software Dependencies	No	The paper mentions using 'lib SVM' (Chang and Lin, 2011) and statistical tests implemented in 'the libraries of the R programming language', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For all procedures and data sets, the hyperparameter search procedure used an 11 x 10 grid search (the S set) following the ranges and steps popularized by libsvm (Hsu et al., 2010) i.e. C = {2^-5, 2^-3, ..., 2^15}, and γ = {2^-15, 2^-13, ..., 2^3}.