reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hierarchically Compositional Kernels for Scalable Nonparametric Learning

Authors: Jie Chen, Haim Avron, Vikas Sindhwani

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller r. We demonstrate comprehensive experiments to show the eﬀective use of the proposed kernel on data sizes up to the order of millions. Keywords: Hierarchical kernels, Nonparametric Learning
Researcher Affiliation	Collaboration	Jie Chen EMAIL IBM Thomas J. Watson Research Center Haim Avron EMAIL Tel Aviv University Vikas Sindhwani EMAIL Google Brain, NYC
Pseudocode	Yes	For completeness, we summarize the pesudocode in Algorithm 1 as a reference for computer implementation. Algorithm 1 Computing y = Ab Algorithm 2 Computing A = A 1 Algorithm 3 Computing z = w T khierarchical(X, x) for x / X
Open Source Code	No	The paper states: "The programs were implemented in C++. The linear algebra routines (BLAS and LAPACK) were linked against the IBM ESSL library and some obviously parallelizable for-loops were decorated by Open MP pragmas, so that the experiments with large data sets can be completed in a reasonable time frame." This describes the implementation details and libraries used but does not provide concrete access to the authors' own source code for the methodology described.
Open Datasets	Yes	Table 1 summarizes the data sets used for experiments. They were all downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm/ and are widely used as benchmarks of kernel methods.
Dataset Splits	Yes	Some of the data sets come with a training and a testing part; for those not, we performed a 4:1 split to create the two parts, respectively.
Hardware Specification	Yes	The experiments were performed on one node of a computing cluster with 512 GB memory and an IBM POWER8 12-core processor, unless otherwise stated.
Software Dependencies	No	The programs were implemented in C++. The linear algebra routines (BLAS and LAPACK) were linked against the IBM ESSL library and some obviously parallelizable for-loops were decorated by Open MP pragmas, so that the experiments with large data sets can be completed in a reasonable time frame. However, the programs were neither fully optimized nor fully parallelized, leaving room for improvement on both memory consumption and execution time.
Experiment Setup	Yes	We use the Gaussian kernel (5) as an example. As hinted earlier, the choice of the range parameter σ aﬀects the quality of various kernels. Therefore, the experiment setup is to use a reasonable regularization λ = 0.01 and to vary the choice of σ in a large interval (between 0.01 and 100) such that the optimal σ falls within the interval. We set the rank r (and the leaf size n0) according to (22) with three particular choices: r = 32, 129, and 516.