MMD Aggregated Two-Sample Test
Authors: Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that MMDAgg significantly outperforms alternative state-of-the-art MMD-based two-sample tests on synthetic data satisfying the Sobolev smoothness assumption, and that, on real-world image data, MMDAgg closely matches the power of tests leveraging the use of models such as neural networks. |
| Researcher Affiliation | Academia | Antonin Schrab EMAIL Centre for Artificial Intelligence, University College London & Inria London Gatsby Computational Neuroscience Unit, University College London London, WC1V 6LJ, UK Ilmun Kim EMAIL Department of Statistics & Data Science, Department of Applied Statistics, Yonsei University Seoul, 03722, South Korea M elisande Albert EMAIL Institut de Math ematiques de Toulouse; UMR 5219, Universit e de Toulouse; CNRS, INSA; France B eatrice Laurent EMAIL Institut de Math ematiques de Toulouse; UMR 5219, Universit e de Toulouse; CNRS, INSA; France Benjamin Guedj EMAIL Centre for Artificial Intelligence, University College London & Inria London London, WC1V 6LJ, UK Arthur Gretton EMAIL Gatsby Computational Neuroscience Unit, University College London London, W1T 4JG, UK |
| Pseudocode | Yes | Algorithm 1: MMDAgg Λw,B1:3 α |
| Open Source Code | Yes | We provide a user-friendly parameter-free implementation of MMDAgg, both in Jax and in Numpy, available at https://github.com/antoninschrab/mmdagg-paper. This repository also contains code for the reproducibility of our experiments. |
| Open Datasets | Yes | we consider the MNIST dataset (Le Cun et al., 2010) down-sampled to 7 × 7 images. |
| Dataset Splits | Yes | For the split and oracle tests, we use two equal halves of the data, and oracle is run on twice the sample sizes. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, processor types, or memory amounts) are provided for running the experiments. |
| Software Dependencies | No | We provide a user-friendly parameter-free implementation of MMDAgg, both in Jax and in Numpy, available at https://github.com/antoninschrab/mmdagg-paper. However, specific version numbers for these libraries are not provided. |
| Experiment Setup | Yes | We use level α = 0.05 for all our experiments. We use B1 = 2000 and B2 = 2000 simulated test statistics to estimate the quantiles and the probability in Equation (13) for the level correction, respectively, and use B3 = 50 steps of bisection method. For the median, split and oracle tests, we use B = 500 simulated test statistics to estimate the quantile. |