MMD Aggregated Two-Sample Test

Authors: Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that MMDAgg significantly outperforms alternative state-of-the-art MMD-based two-sample tests on synthetic data satisfying the Sobolev smoothness assumption, and that, on real-world image data, MMDAgg closely matches the power of tests leveraging the use of models such as neural networks.
Researcher Affiliation Academia Antonin Schrab EMAIL Centre for Artificial Intelligence, University College London & Inria London Gatsby Computational Neuroscience Unit, University College London London, WC1V 6LJ, UK Ilmun Kim EMAIL Department of Statistics & Data Science, Department of Applied Statistics, Yonsei University Seoul, 03722, South Korea M elisande Albert EMAIL Institut de Math ematiques de Toulouse; UMR 5219, Universit e de Toulouse; CNRS, INSA; France B eatrice Laurent EMAIL Institut de Math ematiques de Toulouse; UMR 5219, Universit e de Toulouse; CNRS, INSA; France Benjamin Guedj EMAIL Centre for Artificial Intelligence, University College London & Inria London London, WC1V 6LJ, UK Arthur Gretton EMAIL Gatsby Computational Neuroscience Unit, University College London London, W1T 4JG, UK
Pseudocode Yes Algorithm 1: MMDAgg Λw,B1:3 α
Open Source Code Yes We provide a user-friendly parameter-free implementation of MMDAgg, both in Jax and in Numpy, available at https://github.com/antoninschrab/mmdagg-paper. This repository also contains code for the reproducibility of our experiments.
Open Datasets Yes we consider the MNIST dataset (Le Cun et al., 2010) down-sampled to 7 × 7 images.
Dataset Splits Yes For the split and oracle tests, we use two equal halves of the data, and oracle is run on twice the sample sizes.
Hardware Specification No No specific hardware details (GPU/CPU models, processor types, or memory amounts) are provided for running the experiments.
Software Dependencies No We provide a user-friendly parameter-free implementation of MMDAgg, both in Jax and in Numpy, available at https://github.com/antoninschrab/mmdagg-paper. However, specific version numbers for these libraries are not provided.
Experiment Setup Yes We use level α = 0.05 for all our experiments. We use B1 = 2000 and B2 = 2000 simulated test statistics to estimate the quantiles and the probability in Equation (13) for the level correction, respectively, and use B3 = 50 steps of bisection method. For the median, split and oracle tests, we use B = 500 simulated test statistics to estimate the quantile.