Kernel Quantile Embeddings and Associated Probability Metrics
Authors: Masha Naslidnyk, Siu Lun Chau, Francois-Xavier Briol, Krikamol Muandet
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of KQDs for nonparametric two-sample hypothesis testing... It is studied empirically in Section 5 with experiments on two-sample hypothesis testing... We conduct four experiments: two using synthetic data... and two based on high-dimensional image data... |
| Researcher Affiliation | Academia | 1Department of Computer Science, University College London, London, UK 2College of Computing & Data Science, Nanyang Technological University, Singapore 3Department of Statistical Science, University College London, London, UK 4CISPA Helmholtz Center for Information Security, Saarbrücken, Germany. Correspondence to: Masha Naslidnyk <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Gaussian e-KQD |
| Open Source Code | Yes | The code is available at https://github.com/Masha Naslidnyk/kqe. |
| Open Datasets | Yes | 3. Galaxy MNIST. We examine performance on real-world data through galaxy images (Walmsley et al., 2022)... 4. CIFAR-10 v.s. CIFAR-10.1. We conclude with an experiment on telling apart the CIFAR-10 (Krizhevsky et al., 2012) and CIFAR-10.1 (Recht et al., 2019) test sets... |
| Dataset Splits | Yes | We consider a significance level α of 0.05 throughout and report on Type I control in Appendix D. To determine the rejection threshold for each test statistic, we employ a permutation-based approach: for each trial we pool the two samples, randomly reassign labels 300 times to simulate draws under H0, compute the test statistic on each permuted split, and take the 95th percentile of this empirical null distribution as our threshold. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory amounts) are mentioned in the paper. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers. |
| Experiment Setup | Yes | For e-KQD, we set the number of projections to l = log n and the number of samples drawn from the Gaussian reference to m = log n. ... We take power p = 2 for all KQD-based discrepancies... We use the RBF kernel k(x, x ) = exp( x x 2/2σ2) with σ the bandwidth chosen using the median heuristic method, i.e. σ = Median({ xi xj 2 2, i, j 1, . . . , n}). ... We consider a significance level α of 0.05 throughout... we select a polynomial kernel of degree 3, i.e. k(x, x ) = ( x, x + 1)3, for all our methods. |