reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large Scale Online Kernel Learning

Authors: Jing Lu, Steven C.H. Hoi, Jialei Wang, Peilin Zhao, Zhi-Yong Liu

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The encouraging results of our experiments on large-scale datasets validate the eﬀectiveness and eﬃciency of the proposed algorithms, making them potentially more practical than the family of existing budget online kernel learning approaches. Keywords: online learning, kernel approximation, large scale machine learning
Researcher Affiliation	Academia	Jing Lu EMAIL Steven C.H. Hoi EMAIL School of Information Systems, Singapore Management University 80 Stamford Road, Singapore, 178902 Jialei Wang EMAIL Department of Computer Science, University of Chicago 5050 S Lake Shore Drive Apt S2009 Chicago IL, USA, 60637 Peilin Zhao EMAIL Institute for Infocomm Research, A*STAR 1 Fusionopolis Way, 21-01 Connexis, Singapore, 138632 Zhi-Yong Liu EMAIL State Key Lab of Management and Control for Complex System, Chinese Academy of Sciences No. 95 Zhongguancun East Road, Haidian District, Beijing, China, 100190
Pseudocode	Yes	Algorithm 1 FOGD Fourier Online Gradient Descent for Binary Classiﬁcation Algorithm 2 NOGD Nystr om Online Gradient Descent for Binary Classiﬁcation Algorithm 3 MFOGD Multi-class Fourier Online Gradient Descent Algorithm 4 MNOGD Multi-class Nystr om Online Gradient Descent Algorithm 5 FOGD-R Fourier Online Gradient Descent for Regression Algorithm 6 NOGD-R Nystr om Online Gradient Descent for Regression
Open Source Code	Yes	All the source code and datasets for our experiments in this work can be downloaded from our project web page:http://LSOKL.stevenhoi.org/.
Open Datasets	Yes	All of them can be downloaded from LIBSVM website 1 or KDDCUP competition site 2. All of them can be downloaded from LIBSVM website, UCI machine learning repository 4 and KDDCUP competition site.
Dataset Splits	Yes	We follow the original splits of training and test sets in LIBSVM. For KDD datasets, a random split of 4/1 is used. For each data set, all the experiments were repeated 20 times using diﬀerent random permutation of instances in the dataset.
Hardware Specification	No	All the algorithms were implemented in C++, and conducted on a Windows machine with CPU of 3.0GHz. All algorithms are implemented in Matlab R2013b, on a Windows machine with 3.0 GHZ CPU,6 cores.
Software Dependencies	Yes	All algorithms are implemented in Matlab R2013b, on a Windows machine with 3.0 GHZ CPU,6 cores.
Experiment Setup	Yes	To make a fair comparison of algorithms with diﬀerent parameters, all the parameters, including regularization parameter (C in LIBSVM, λ in pegasos), the learning rate (η in FOGD and NOGD) and the RBF kernel width (σ) are optimized by following a standard 5-fold cross validation on the training datasets. The Gaussian kernel bandwidth is set to 8. The step size η in the all online gradient descent based algorithms is chosen through a random search in range {2, 0.2, ..., 0.0002}. We adopt the same budget size B = 100 for NOGD and other budget algorithms. In the setting of FOGD algorithm, D = ρf B, where 0 < ρf < is a predeﬁned parameter that controls the number of random Fourier components. For NOGD algorithm, k = ρn B, where 0 < ρn < 1 is a predeﬁned parameter that controls the accuracy of matrix approximation. We set ρf = 4 and ρn = 0.2 and will evaluate their inﬂuence on the algorithm performance in the following discussion.