Kernel Quantile Embeddings and Associated Probability Metrics

Authors: Masha Naslidnyk, Siu Lun Chau, Francois-Xavier Briol, Krikamol Muandet

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the effectiveness of KQDs for nonparametric two-sample hypothesis testing... It is studied empirically in Section 5 with experiments on two-sample hypothesis testing... We conduct four experiments: two using synthetic data... and two based on high-dimensional image data...
Researcher Affiliation Academia 1Department of Computer Science, University College London, London, UK 2College of Computing & Data Science, Nanyang Technological University, Singapore 3Department of Statistical Science, University College London, London, UK 4CISPA Helmholtz Center for Information Security, Saarbrücken, Germany. Correspondence to: Masha Naslidnyk <EMAIL>.
Pseudocode Yes Algorithm 1 Gaussian e-KQD
Open Source Code Yes The code is available at https://github.com/Masha Naslidnyk/kqe.
Open Datasets Yes 3. Galaxy MNIST. We examine performance on real-world data through galaxy images (Walmsley et al., 2022)... 4. CIFAR-10 v.s. CIFAR-10.1. We conclude with an experiment on telling apart the CIFAR-10 (Krizhevsky et al., 2012) and CIFAR-10.1 (Recht et al., 2019) test sets...
Dataset Splits Yes We consider a significance level α of 0.05 throughout and report on Type I control in Appendix D. To determine the rejection threshold for each test statistic, we employ a permutation-based approach: for each trial we pool the two samples, randomly reassign labels 300 times to simulate draws under H0, compute the test statistic on each permuted split, and take the 95th percentile of this empirical null distribution as our threshold.
Hardware Specification No No specific hardware details (like GPU models, CPU types, or memory amounts) are mentioned in the paper.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers.
Experiment Setup Yes For e-KQD, we set the number of projections to l = log n and the number of samples drawn from the Gaussian reference to m = log n. ... We take power p = 2 for all KQD-based discrepancies... We use the RBF kernel k(x, x ) = exp( x x 2/2σ2) with σ the bandwidth chosen using the median heuristic method, i.e. σ = Median({ xi xj 2 2, i, j 1, . . . , n}). ... We consider a significance level α of 0.05 throughout... we select a polynomial kernel of degree 3, i.e. k(x, x ) = ( x, x + 1)3, for all our methods.