reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kernel Quantile Embeddings and Associated Probability Metrics

Authors: Masha Naslidnyk, Siu Lun Chau, Francois-Xavier Briol, Krikamol Muandet

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the effectiveness of KQDs for nonparametric two-sample hypothesis testing... It is studied empirically in Section 5 with experiments on two-sample hypothesis testing... We conduct four experiments: two using synthetic data... and two based on high-dimensional image data...
Researcher Affiliation	Academia	1Department of Computer Science, University College London, London, UK 2College of Computing & Data Science, Nanyang Technological University, Singapore 3Department of Statistical Science, University College London, London, UK 4CISPA Helmholtz Center for Information Security, Saarbrücken, Germany. Correspondence to: Masha Naslidnyk <EMAIL>.
Pseudocode	Yes	Algorithm 1 Gaussian e-KQD
Open Source Code	Yes	The code is available at https://github.com/Masha Naslidnyk/kqe.
Open Datasets	Yes	3. Galaxy MNIST. We examine performance on real-world data through galaxy images (Walmsley et al., 2022)... 4. CIFAR-10 v.s. CIFAR-10.1. We conclude with an experiment on telling apart the CIFAR-10 (Krizhevsky et al., 2012) and CIFAR-10.1 (Recht et al., 2019) test sets...
Dataset Splits	Yes	We consider a significance level α of 0.05 throughout and report on Type I control in Appendix D. To determine the rejection threshold for each test statistic, we employ a permutation-based approach: for each trial we pool the two samples, randomly reassign labels 300 times to simulate draws under H0, compute the test statistic on each permuted split, and take the 95th percentile of this empirical null distribution as our threshold.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or memory amounts) are mentioned in the paper.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers.
Experiment Setup	Yes	For e-KQD, we set the number of projections to l = log n and the number of samples drawn from the Gaussian reference to m = log n. ... We take power p = 2 for all KQD-based discrepancies... We use the RBF kernel k(x, x ) = exp( x x 2/2σ2) with σ the bandwidth chosen using the median heuristic method, i.e. σ = Median({ xi xj 2 2, i, j 1, . . . , n}). ... We consider a significance level α of 0.05 throughout... we select a polynomial kernel of degree 3, i.e. k(x, x ) = ( x, x + 1)3, for all our methods.