reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Random Features for Dot Product Kernels

Authors: Jonas Wacker, Motonobu Kanagawa, Maurizio Filippone

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets. Keywords: Random features, randomized sketches, dot product kernels, polynomial kernels, large scale learning
Researcher Affiliation	Academia	Jonas Wacker EMAIL Motonobu Kanagawa EMAIL Data Science Department, EURECOM, France Maurizio Filippone EMAIL Statistics Program, KAUST, Saudi Arabia
Pseudocode	Yes	Algorithm 1: Real and Complex Tensor SRHT Algorithm 2: Incremental Algorithm Algorithm 3: Extended Incremental Algorithm Algorithm 4: Improved Random Maclaurin (RM) Features
Open Source Code	Yes	Software package. We provide a Git Hub repository3 with modern implementations for all the methods studied in this work supporting GPU acceleration and automatic diﬀerentiation in Py Torch (Paszke et al., 2019). Since version 1.8, Py Torch natively supports numerous linear algebra operations on complex numbers4. The same is true for Num Py (Harris et al., 2020) and Tensor Flow (Abadi et al., 2016). 3. Our code is available at: https://github.com/joneswack/dp-rfs
Open Datasets	Yes	All the datasets come from the UCI benchmark (Dua and Graﬀ, 2017) except for Cod rna (Uzilov et al., 2006), Fashion MNIST (Xiao et al., 2017), and MNIST (Lecun et al., 1998).
Dataset Splits	Yes	The train/test split is 90/10 and is recomputed for every random seed for the UCI datasets; otherwise it is predeﬁned. For each dataset, we use its random subsets of size m = min(5000, Ntrain) and m = min(5000, Ntest) to deﬁne training and test data in an experiment, respectively, where Ntrain and Ntest are the sizes of the original training and test datasets.
Hardware Specification	Yes	We recorded the time measurements on an NVIDIA P100 GPU and Py Torch version 1.10 with native complex linear algebra support.
Software Dependencies	Yes	We recorded the time measurements on an NVIDIA P100 GPU and Py Torch version 1.10 with native complex linear algebra support. The Py Torch 1.8 release notes are available at: https://github.com/pytorch/pytorch/releases/tag/v1.8.0
Experiment Setup	Yes	For the optimized Maclaurin approach in Algorithm 3, we set pmin = 2 and pmax = 10. We use the training subset Xsub = {x1, . . . , xm} to precompute the U-statistics in Eq. (49) and Eq. (50). Regularization parameters. We select the regularization parameter in GP classiﬁcation and regression by a training-validation procedure. That is, we use the 90 % of training data for training and the remaining 10 % for validation, and select the regularization parameter that maximizes the MNLL on the validation set. For GP classiﬁcation, we choose the regularization parameter from the range α {10 5, . . . , 10 0}. For GP regression, we choose the noise variance from the range σ2 noise {2 15, . . . , 215}.