Composite Goodness-of-fit Tests with Kernels
Authors: Oscar Key, Arthur Gretton, François-Xavier Briol, Tamara Fernandez
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following section, we now study the performance of our tests in a range of settings. We consider three examples: (i) a multivariate Gaussian location and scale model, (ii) a generative model of the interaction of genes through time called a toggle-switch model, and (iii) an unnormalised nonparametric density model. For all experiments we set the test level α = 0.05, and give full implementation details in Section D. |
| Researcher Affiliation | Academia | Oscar Key EMAIL Centre for Artificial Intelligence, University College London Arthur Gretton EMAIL Gatsby Computational Neuroscience Unit, University College London François-Xavier Briol EMAIL Department of Statistical Science, University College London Tamara Fernandez EMAIL Faculty of Engineering and Science, Adolfo Ibañez University |
| Pseudocode | Yes | Algorithm 1: Wild bootstrap (MMD) ... Algorithm 2: Wild bootstrap (KSD) ... Algorithm 3: Parametric bootstrap |
| Open Source Code | Yes | The code to reproduce our experiments, and implement the tests for your own models, is available at: https://github.com/oscarkey/composite-tests |
| Open Datasets | Yes | Matsubara et al. (2022) use this model for a density estimation task on a data set of velocities of 82 galaxies (Postman et al., 1986; Roeder, 1990), and use the approximation with p = 25 for computational convenience. |
| Dataset Splits | No | The paper describes generating data for experiments and using existing datasets, but does not explicitly provide training/test/validation splits needed to reproduce the experiment. For example, in the Gaussian Model and Toggle-Switch Model sections, data is generated without explicit splits, and the Density Estimation section uses a single dataset without specifying reproduction splits. |
| Hardware Specification | Yes | The experiments are implemented in JAX, and executed on an Nvidia RTX 3090 GPU. |
| Software Dependencies | No | The experiments are implemented in JAX, and executed on an Nvidia RTX 3090 GPU. While JAX is mentioned as a software component, no specific version number is provided for JAX or any other libraries used. |
| Experiment Setup | Yes | For the experiments using the MMD, we minimize the minimum distance estimator using the Adam optimiser (Kingma and Ba, 2015). We use a learning rate of 0.05 and 100 iterations. When θ = µd (Figures 4 to 6) we sample the initial value of µd from N(0d, Id). When θ = {µ, σ2} (Figures 3 and 7) sample the initial µ from N(0, 1), and sample σ from a standard normal distribution truncated between 0.5 and 1.5. ... We use stochastic gradient descent with random restarts to estimate the parameters. The algorithm is: ... 3. Run SGD 15 times, once for each of the selected θi init, to find ˆθ1 n, . . . , ˆθ15 n . Use learning rate 0.04 and perform 300 iterations. |