Memory Efficient Kernel Approximation

Authors: Si Si, Cho-Jui Hsieh, Inderjit S. Dhillon

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As an example, on the covtype dataset with half a million samples, MEKA takes around 70 seconds and uses less than 80 MB memory on a single machine to achieve 10% relative approximation error, while standard Nystr om approximation is about 6 times slower and uses more than 400MB memory to achieve similar approximation. We also present extensive experiments on applying MEKA to speed up kernel ridge regression.
Researcher Affiliation Collaboration Si Si EMAIL Google Research Mountain View, CA 99999, USA Cho-Jui Hsieh EMAIL Departments of Computer Science and Statistics University of California, Davis Davis, CA 95616, USA Inderjit S. Dhillon EMAIL Department of Computer Science University of Texas at Austin Austin, TX 78701, USA
Pseudocode Yes Algorithm 1 Memory Efficient Kernel Approximation (MEKA)
Open Source Code Yes The code for MEKA is available at www.cs.utexas.edu/~ssi/meka/.
Open Datasets Yes The thee datasets can be downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets or the UCI data repository.
Dataset Splits Yes We use a random 80%-20% split for covtype, and the original training/testing split for other datasets.
Hardware Specification Yes All the experiments are conducted on an Intel 2.66GHz CPU with 8G RAM.
Software Dependencies No The paper mentions using 'LIBSVM' and 'LIBLINEAR' but does not specify version numbers for these software components.
Experiment Setup Yes We chose the balancing parameter C and kernel parameter γ by 5-fold cross validation on a grid of points: C = [2 10, 2 9, . . . , 210] and γ = [2 10, . . . , 210] for ijcnn1, census, and covtype. For BKA-SVM, we set the number of clusters to be 64 for these three datasets. There is a tradeoffbetween the number of clusters and prediction accuracy. If we increase the number of clusters, BKA-SVM will become faster, but the prediction accuracy will mostly decrease. On the other hand, if the number of clusters is set smaller, BKA-SVM can achieve higher accuracy (in most cases), while takes more time to train. The following are parameter settings for other methods in Table 2: the rank is set to be 3000 in LLSVM; number of Fourier features is 3000 in Fastfood1; the tolerance in the stopping condition for LIBSVM is set to 10 3 (the default setting of LIBSVM).