reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Memory Efficient Kernel Approximation

Authors: Si Si, Cho-Jui Hsieh, Inderjit S. Dhillon

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As an example, on the covtype dataset with half a million samples, MEKA takes around 70 seconds and uses less than 80 MB memory on a single machine to achieve 10% relative approximation error, while standard Nystr om approximation is about 6 times slower and uses more than 400MB memory to achieve similar approximation. We also present extensive experiments on applying MEKA to speed up kernel ridge regression.
Researcher Affiliation	Collaboration	Si Si EMAIL Google Research Mountain View, CA 99999, USA Cho-Jui Hsieh EMAIL Departments of Computer Science and Statistics University of California, Davis Davis, CA 95616, USA Inderjit S. Dhillon EMAIL Department of Computer Science University of Texas at Austin Austin, TX 78701, USA
Pseudocode	Yes	Algorithm 1 Memory Eﬃcient Kernel Approximation (MEKA)
Open Source Code	Yes	The code for MEKA is available at www.cs.utexas.edu/~ssi/meka/.
Open Datasets	Yes	The thee datasets can be downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets or the UCI data repository.
Dataset Splits	Yes	We use a random 80%-20% split for covtype, and the original training/testing split for other datasets.
Hardware Specification	Yes	All the experiments are conducted on an Intel 2.66GHz CPU with 8G RAM.
Software Dependencies	No	The paper mentions using 'LIBSVM' and 'LIBLINEAR' but does not specify version numbers for these software components.
Experiment Setup	Yes	We chose the balancing parameter C and kernel parameter γ by 5-fold cross validation on a grid of points: C = [2 10, 2 9, . . . , 210] and γ = [2 10, . . . , 210] for ijcnn1, census, and covtype. For BKA-SVM, we set the number of clusters to be 64 for these three datasets. There is a tradeoﬀbetween the number of clusters and prediction accuracy. If we increase the number of clusters, BKA-SVM will become faster, but the prediction accuracy will mostly decrease. On the other hand, if the number of clusters is set smaller, BKA-SVM can achieve higher accuracy (in most cases), while takes more time to train. The following are parameter settings for other methods in Table 2: the rank is set to be 3000 in LLSVM; number of Fourier features is 3000 in Fastfood1; the tolerance in the stopping condition for LIBSVM is set to 10 3 (the default setting of LIBSVM).