reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiclass Anomaly Detector: the CS++ Support Vector Machine

Authors: Alistair Shilton, Sutharshan Rajasegarar, Marimuthu Palaniswami

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, experimental results are presented to demonstrate the eﬀectiveness of the algorithm for both simulated and real-world data.
Researcher Affiliation	Academia	Alistair Shilton EMAIL Applied Artiﬁcial Intelligence Institute (A2I2) Deakin University Geelong, Australia Sutharshan Rajasegarar EMAIL School of Information Technology Deakin University Geelong, Australia Marimuthu Palaniswami EMAIL Department of Electrical and Electronic Engineering The University of Melbourne Melbourne, Australia
Pseudocode	No	The paper describes the proposed CS++-SVM algorithm using mathematical formulations and textual descriptions in sections 4 and 5, but it does not present a clearly labeled pseudocode block or algorithm steps.
Open Source Code	Yes	All experiments were run using SVMHeavy (Shilton, 2001 2020),8 which is an active-set based optimisation library written in C++ (see Shilton et al. (2005) for details). Alternative optimisation libraries include Nandan et al. (2014); Claesen et al. (2014). Footnote 8: Code available at https://github.com/apshsh/SVMHeavy.
Open Datasets	Yes	In this experiment the performance of the CS++-SVM and hybrid schemes were compared using the UCI (Dua and Graﬀ, 2017) Optical Recognition of Handwritten Digits Data Set (DIG) (Kaynak, 1995; Alpaydin and Kaynak, 1998) for training and Kassel and Taskar s OCR data set of handwritten lower-case characters (CHR) 11 (Kassel, 1995; Taskar et al., 2004) as our anomaly test set. Footnote 11: Available at http://ai.stanford.edu/~btaskar/ocr/. We have considered three data sets from the UCI repository (Dua and Graﬀ, 2017) here, namely human activity recognition (HAR (Anguita et al., 2013)), daily and sports activities (DSA (Altun et al., 2010; Barshan and Y uksek, 2014; Altun and Barshan, 2010)), and forest type mapping (forest (Johnson et al., 2012)).
Dataset Splits	Yes	150 training vectors were generated for each class, giving a total training set size of 750 training vectors. In addition to this a testing set of 10000 vectors (2000 from each class) was generated. The DIG data set contains 3823 training instances (of which we used 2000 due to memory restrictions) and 1797 testing instances over 10 classes (digits 0-9); For the DSA data set we treat activities 1-5 as class 1 (known), activities 6-11 as class 2 (known), and activites 12-19 as our unknown (anomaly) set; where 5000 points from class 1 and 5000 points from class 2 were randomly selected for the training set.
Hardware Specification	No	The paper mentions that "All experiments were run using SVMHeavy (Shilton, 2001 2020), which is an active-set based optimisation library written in C++", but it does not specify any hardware details like CPU, GPU, or memory.
Software Dependencies	Yes	All experiments were run using SVMHeavy (Shilton, 2001 2020),8 which is an active-set based optimisation library written in C++ (see Shilton et al. (2005) for details).
Experiment Setup	Yes	For these experiments we have used the radial basis kernel and polynomial kernel functions: Krbf (x, y) = exp 1 Kpoly (x, y) = (1 + x, y )d where the kernel parameters are, respectively, 10 2 γ 102 and d {1, 2, 3, 4, 5}. The trade-oﬀparameter C was selected from 10 2 C N 102. Parameter selection of C and d or γ (i.e. classiﬁcation related training parameters) was carried out using a grid search to minimise leave-one-out error measured on the training set. For this experiment we have chosen the RBF kernel. Representative results for the CS++-SVM and hybrid schemes are shown in Figure 7, where ν = 0.05, C = 1 and γ = 10 (C and γ selected to minimise leave-one-out error, ν selected arbitrarily). For consistency with previous experiments we have chosen ν = 0.05 for our experiments.