reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified View on Multi-class Support Vector Classification

Authors: Ürün Doğan, Tobias Glasmachers, Christian Igel

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compared the performance of nine diﬀerent multi-class SVMs in a thorough empirical study. Our results suggest to use the Weston & Watkins SVM, which can be trained comparatively fast and gives good accuracies on benchmark functions. If training time is a major concern, the one-vs-all approach is the method of choice.
Researcher Affiliation	Collaboration	Ur un Do gan EMAIL Microsoft Research Tobias Glasmachers EMAIL Institut f ur Neuroinformatik Ruhr-Universit at Bochum, Germany Christian Igel EMAIL Department of Computer Science University of Copenhagen, Denmark
Pseudocode	No	The paper includes mathematical formulations for SVMs and dual problems, such as Equation (3) for regularized empirical risk and Equation (5) for the dual problem, but it does not present any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	All algorithms were implemented in the Shark open source machine learning library (Igel et al., 2008). Source code for reproducing the results can be found in the supplementary material.
Open Datasets	Yes	First, twelve standard benchmark data sets were considered for non-linear SVM learning, and careful model selection was conducted. This already makes our experiments the most extensive comparison of multi-class SVMs so far. Second, additional experiments for linear SVMs are presented later in this section. [...] These data sets are available from the libsvm data collection.
Dataset Splits	Yes	Repeated cross-validation was employed as model selection criterion. Five-fold cross-validation was repeated ten times using ten independent random splits into ﬁve folds. This stabilizes the model selection procedure especially for small data sets. [...] Using the best parameters found during model selection, 100 machines were trained on 100 random splits into training and test data (preserving the original set sizes).
Hardware Specification	No	The paper mentions that training time was measured 'on a single core' and discusses linear vs. non-linear machines, but it does not specify any particular CPU or GPU models, or other hardware components used for running the experiments.
Software Dependencies	No	The paper states: 'All algorithms were implemented in the Shark open source machine learning library (Igel et al., 2008).' While it names a software library, it does not provide a specific version number for Shark or any other key software components used in the implementation.
Experiment Setup	Yes	The bandwidth γ of the Gaussian kernel and the regularization parameter C of the machine were determined by nested grid search. [...] We set the initial grid to γ {2 12+3i \| i = 0, 1, . . . , 4} and C {23i \| i = 0, 1, . . . , 4}. Let (γ0, C0) denote the parameter conﬁguration picked in the ﬁrst stage. Then in the second stage the parameters were further reﬁned on the grid γ {2i γ0 \| i = 2, 1, 0, 1, 2} and C {2i C0 \| i = 2, 1, 0, 1, 2}.