reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Kernel Tests Without Data Splitting

Authors: Jonas Kübler, Wittawat Jitkrittum, Bernhard Schölkopf, Krikamol Muandet

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	At the same signiﬁcance level, our approach s test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.The empirical results suggest that, at the same signiﬁcance level, the test power of our approach is larger than that of the data-splitting approach, regardless of the split proportion (cf. Section 5).We demonstrate the advantages of OST over data-splitting approaches and the Wald test with kernel two-sample testing problems as described in Section 2.
Researcher Affiliation	Collaboration	Jonas M. Kübler Wittawat Jitkrittum Bernhard Schölkopf Krikamol Muandet Max Planck Institute for Intelligent Systems, Tübingen, Germany EMAIL, EMAIL Now with Google Research
Pseudocode	Yes	Algorithm 1 One-Sided Test (OST)
Open Source Code	Yes	The code for the experiments is available at https://github.com/MPI-IS/tests-wo-splitting.
Open Datasets	Yes	2. MNIST (p = 49): We consider downsampled 7x7 images of the MNIST dataset [40], where P contains all the digits and Q only uneven digits.
Dataset Splits	No	The paper discusses 'data splitting' where a portion of data is used for 'learning' and the rest for 'testing' (e.g., 'SPLIT0.1 denotes that 10% of the data are used for learning β and 90% are used for testing'). However, it does not explicitly define distinct training, validation, and test splits or provide specific percentages for a three-way split.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud resources) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions 'Sci Py: Open source scientiﬁc tools for Python' [26] and 'The cvxopt linear and quadratic cone program solvers' [43], but no specific version numbers for these or other key software components are provided.
Experiment Setup	Yes	For all the setups we estimate the Type-II error for various sample sizes at a level α = 0.05. Error rates are estimated over 5000 independent trialsFor each dataset we consider three different base sets of kernels K and choose σ with the median heuristic: (a) d = 1: K = [k σ], (b) d = 2: K = [k σ, klin], (c) d = 6: K = [k0.25 σ, k0.5 σ, k σ, k2 σ, k4 σ, klin].