Parsimonious Online Learning with Kernels via Sparse Projections in Function Space

Authors: Alec Koppel, Garrett Warnell, Ethan Stump, Alejandro Ribeiro

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate POLK for kernel multi-class logistic regression and kernel hinge-loss classification on three canonical data sets: a synthetic Gaussian mixture model, the MNIST hand-written digits, and the Brodatz texture database. On all three tasks, we observe a favorable trade-offof objective function evaluation, classification performance, and complexity of the nonparametric regressor extracted by the proposed method.
Researcher Affiliation Collaboration Alec Koppel EMAIL Computational and Information Sciences Directorate U.S. Army Research Laboratory Adelphi, MD 20783, USA Garrett Warnell EMAIL Computational and Information Sciences Directorate U.S. Army Research Laboratory Adelphi, MD 20783, USA Ethan Stump EMAIL Computational and Information Sciences Directorate U.S. Army Research Laboratory Adelphi, MD 20783, USA Alejandro Ribeiro EMAIL Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia, PA 19104, USA
Pseudocode Yes Algorithm 1 Destructive Kernel Orthogonal Matching Pursuit (KOMP) Algorithm 2 Parsimonious Online Learning with Kernels (POLK)
Open Source Code No The paper includes license information for the paper itself (CC-BY 4.0), but does not provide any specific statement or link for the open-source code of the methodology described.
Open Datasets Yes We evaluate POLK for kernel multi-class logistic regression and kernel hinge-loss classification on three canonical data sets: a synthetic Gaussian mixture model, the MNIST hand-written digits, and the Brodatz texture database. On all three tasks, we observe a favorable trade-offof objective function evaluation, classification performance, and complexity of the nonparametric regressor extracted by the proposed method. mnist The mnist data set we use is the popular MNIST data set (Lecun and Cortes, 2009). brodatz We generated the brodatz data set using a subset of the images provided in Brodatz (1966).
Dataset Splits Yes multidist In a manner similar to (Zhu and Hastie, 2005), we generate the multidist data set using a set of Gaussian mixture models. The data set consists N = 5000 feature-label pairs for training and 2500 for testing. mnist The mnist data set we use is the popular MNIST data set (Lecun and Cortes, 2009), which consists of N = 60000 feature-label pairs for training and 10000 for testing. brodatz We generated the brodatz data set using a subset of the images provided in Brodatz (1966). [...] When then randomly selected N = 10000 feature-label pairs for training and 5000 for testing.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions several software tools and libraries (e.g., LIBSVM, L-BFGS solver), but does not provide specific version numbers for any of them, which is necessary for reproducibility.
Experiment Setup Yes For POLK, we used the following specific parameter values: we select the Gaussian/RBF kernel with bandwidth σ2 = 0.6, constant learning rate η = 6.0, parsimony constant K = 0.04, and regularization constant λ = 10 6. Further, we processed streaming samples in mini-batches of size 32. For BSGD, Dual, NORMA, NPBSGD, and BPA, we used the same σ2 and λ, but achieved the best results with smaller constant learning rate η = 1.0 (perhaps due, in part, to the fact that it is unclear how to fold mini-batching into their implementation). For Dual, we set k = 3, which determines the how many model points we search over to omit from the dictionary and instead contribute to the random feature functional representation. For NPBSGD, we set the Bernoulli parameter p = min(1, β/t) with β = (45/70)N as in the experiments of Le et al. (2016b). In order to compare with POLK, we set the competing methods pre-specified model orders or budget parameters to be 16, i.e., the steady-state model orders of POLK parameterized with the value of K specified above.