reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bridging performance gap between minimal and maximal SVM models

Authors: Ondrej Such, René Fabricius

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to uncover metrical and topological properties that impact the accuracy of a multi-class SVM model. Based on their results we propose a way to construct intermediate multi-class SVM models. In our work we use convolutional data sets, which have multiple advantages for benchmarking multi-class SVM models.
Researcher Affiliation	Academia	Ondrej Šuch EMAIL Mathematical institute of Slovak Academy of Sciences Ďumbierska 1 Banská Bystrica, 974 01, Slovakia René Fabricius EMAIL Faculty of Management and Informatics Žilinská Univerzita v Žiline Univerzitná 8215/1 Žilina, 010 26, Slovakia
Pseudocode	Yes	Algorithm 1 Incremental multi-class SVM model creation 1: procedure Grow_SVM(T, V ) 2: # Argument T is a training dataset with N classes 3: # Argument V is a validation dataset (possibly V T) 4: 5: Choose a spanning tree with star topology at random. Denote by E the set of its edges. 6: for e := (i, j) E do 7: Train the probabilistic pairwise SVM model on S distinguishing classes i and j 8: Set step 1 9: Set F to be the complement of E in the set of edges of the complete graph on N vertices. 10: while F is nonempty do 11: yield(tuple(step = i, model graph = E)) 12: Compute confusion matrix X on V using (7) to fill missing pairwise odds 13: Set Y = X + X 14: Choose an edge f = (i, j) in F such Yij = max(m,n) F Ymn 15: Train probabilistic pairwise SVM model on S distinguishing classes i and j 16: Set F F {f} 17: Set E E {f}
Open Source Code	Yes	The source repository can be found on https://github.com/ondrej-such/svm3.
Open Datasets	Yes	An overview of the datasets used throughout the paper is summarized in Table 1. The number of classes was selected to balance the requirements of being able to run multiple experiments to get error bounds and to represent a large-class SVM problem while being within our computational capacity. In sections 3 and 4 we have used three datasets each having ten classes: CIFAR-10 (Krizhevsky et al., 2009), Imagenette, and Imagewoof (Howard, 2022), the latter two being subsets of the well-known Imagenet 2012 dataset (Russakovsky et al., 2015).
Dataset Splits	Yes	Table 1: Summary of datasets used in the experiments. Train samples for SVM include augmented data samples. Dataset name Classes # samples when training a neural network (per class) # samples when training SVM (per class) # samples in the testing dataset (per class) CIFAR-10 10 6000 10000 1000 Imagenette 10 1000 10000 50 Imagewoof 10 1000 10000 50 Imagenet-50 50 1000 10000 50 Each training matrix had 10000 entries per class, which included all training samples and also augmented data samples. We selected 200 samples from the training set for each class. we set aside 200 extra samples for each class from the train dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for running experiments. It generally mentions
Software Dependencies	Yes	For computations of SVM, we use R package e1071 (R Core Team, 2021; Meyer et al., 2020). For data processing, we use tidyverse (Wickham et al., 2019) and for visualizations package ggplot2 (Wickham, 2016). Meyer et al., 2020 is listed as "e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien, 2020. URL https://CRAN.R-project.org/package=e1071. R package version 1.7-4."
Experiment Setup	Yes	For CIFAR-10 we have used a custom architecture by David C. Page, which can be quickly trained to 94% accuracy (Page, 2019). For Imagenette and Imagewoof datasets, we used the Resnet-18 network (He et al., 2016). This architecture was designed to solve the Imagenet classification problem, which has 1000 classes. We have trained 20 instances of each architecture on CIFAR-10 and Imagenet respectively. We used default parameters for the svm function from the e1071 library. In particular, data were scaled, and we used default kernel and cost parameters.