reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Convex Algorithms for Universal Kernel Learning

Authors: Aleksandr Talitckii, Brendon Colbert, Matthew M. Peet

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, when applied to benchmark data, the algorithm demonstrates the potential for signiﬁcant improvement in accuracy over typical (but non-convex) approaches such as Neural Nets and Random Forest with similar or better computation time. Numerical experiments conﬁrm that the FW-based algorithm is approximately 100 times faster than the previous SDP algorithm from Colbert and Peet (2020). Finally, 12 large and randomly selected data sets were used to test accuracy of the proposed algorithms compared to 8 existing state-of-the-art alternatives yielding uniform increases in accuracy with similar or reduced computational complexity.
Researcher Affiliation	Academia	Aleksandr Talitckii EMAIL Department of Mechanical and Aerospace Engineering Arizona State University Tempe, AZ 85281-1776, USA Brendon Colbert EMAIL Department of Mechanical and Aerospace Engineering Arizona State University Tempe, AZ 85281-1776, USA Matthew M. Peet EMAIL Department of Mechanical and Aerospace Engineering Arizona State University Tempe, AZ 85281-1776, USA
Pseudocode	Yes	Algorithm 1: The Frank-Wolfe Algorithm for Matrices. Algorithm 2: Proposed FW Algorithm for GKL. Algorithm 3: APD algorithm Algorithm 4: APD algorithm Algorithm 5: APD P Subroutine Algorithm 6: Final version of GKL
Open Source Code	Yes	Implementation and documentation of this method is described in Appendix D.1 and is publicly available via Github (Colbert et al., 2021); This software is available from Github (Colbert et al., 2021).
Open Datasets	Yes	Table 2: References for the data sets used in Section 8. All data sets are available on the UCI Machine Learning Repository or from the LIBSVM database. ... Name Type Source References Liver Classiﬁcation UCI Mc Dermott and Forsyth (2016) Cancer Classiﬁcation UCI Wolberg et al. (1990) Heart Classiﬁcation UCI No Associated Publication Pima Classiﬁcation UCI No Associated Publication Hill Valley Classiﬁcation UCI No Associated Publication Shill Bid Classiﬁcation UCI Alzahrani and Sadaoui (2018, 2020) Abalone Classiﬁcation UCI Waugh (1995) Transfusion Classiﬁcation UCI Yeh et al. (2009) German Classiﬁcation LIBSVM No Associated Publication Four Class Classiﬁcation LIBSVM Ho and Kleinberg (1996) Gas Turbine Regression UCI Kaya et al. (2019) Airfoil Regression UCI Brooks et al. (1989) CCPP Regression UCI T ufekci (2014); Kaya et al. (2012) CA Regression LIBSVM Pace and Barry (1997b) Space Regression LIBSVM Pace and Barry (1997a) Boston Housing Regression LIBSVM Harrison and Rubinfeld (1978)
Dataset Splits	Yes	In both classiﬁcation and regression, the accuracy metric uses 5 random divisions of the data into test sets (mt samples = 20% of data) and training sets (m samples = 80% of data).
Hardware Specification	Yes	All tests are run on an Intel i7-5960X CPU at 3.00 GHz with 128 Gb of RAM. ... All tests are run on a desktop with Intel i7-5960X CPU at 3.00 GHz and with 128 Gb of RAM.
Software Dependencies	No	The paper mentions several software components like "Lib SVM implementation", "LAPACK implementation", "MATLAB's patternnet for classification and feedforwardnet for regression", "scikit-learn python toolbox", "XGBoost algorithm", and "MKLpy python package". However, none of these mentions include specific version numbers for the software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	[TKL] Algorithm 2 with N as deﬁned in Eqn. (11), where Zd is a vector of monomials of degree d = 1 or less. The regression problem is posed using ϵ = .1. The data is scaled so that xi [0, 1]n and [a, b] = [0 δ, 1+δ]n, where δ 0 and C in the kernel learning problem are chosen by 2-fold cross-validation. ... [SMKL] Simple MKL proposed in Rakotomamonjy et al. (2008) with a standard selection of Gaussian and polynomial kernels with bandwidths arbitrarily chosen between .5 and 10 and polynomial degrees one through three yielding approximately 13(n + 1) kernels. The regression and classiﬁcation problems are posed using ϵ = .1 and C is chosen by 2-fold cross-validation; [NNet] A neural network with 3 hidden layers of size 50 using MATLAB s patternnet for classiﬁcation and feedforwardnet for regression where learning is halted after the error in a validation set decreased sequentially 50 times; [RF] The Random Forest algorithm as in Breiman (2004) as implemented on the scikitlearn python toolbox (see Pedregosa et al., 2011)) for classiﬁcation and regression. Between 50 and 650 trees (in 50 tree intervals) are selected using 2-fold cross-validation; [XGBoost] The XGBoost algorithm as implemented in Chen and Guestrin (2016) for classiﬁcation and regresion. Between 50 and 650 trees (in 50 tree intervals) are selected using 2-fold cross-validation;