Kernel Thinning

Authors: Raaz Dwivedi, Lester Mackey

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions d = 2 through 100. We complement our primary methodological and theoretical development with two vignettes illustrating the promise of kernel thinning for improving upon (a) i.i.d. sampling in dimensions d = 2 through 100 and (b) standard MCMC thinning when targeting challenging differential equation posteriors.
Researcher Affiliation Collaboration Raaz Dwivedi EMAIL Cornell Tech Lester Mackey EMAIL Microsoft Research New England
Pseudocode Yes Algorithm 1: Kernel Thinning Algorithm 1a: kt-split Algorithm 1b: kt-swap Algorithm 2: Kernel Halving Algorithm 3: Self-balancing Hilbert Walk
Open Source Code Yes See App. P for supplementary details and https://github.com/microsoft/goodpoints for a Python implementation of kernel thinning and code replicating each vignette.
Open Datasets Yes Next, we illustrate the benefits of kernel thinning over standard Markov chain Monte Carlo (MCMC) thinning on twelve posterior inference experiments conducted by Riabiz et al. (2021). From Riabiz et al. (2020), we obtain the output of four distinct MCMC procedures targeting each of two d = 4-dimensional posterior distributions P: (1) a posterior over the parameters of the Goodwin model of oscillatory enzymatic control (Goodwin, 1965) and (2) a posterior over the parameters of the Lotka Volterra model of oscillatory predator-prey evolution (Lotka, 1925; Volterra, 1926). Marina Riabiz, Wilson Ye Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, and Chris J. Oates. Replication Data for: Optimal Thinning of MCMC Output, 2020. URL https://doi.org/10.7910/DVN/MDKNWM. Accessed on Mar 23, 2021.
Dataset Splits No We discard the initial points of each chain as burnin using the maximum burn-in period reported in Riabiz et al. (2021, Tabs. S4 & S6, App. S5.4). and normalize each Hinch chain by subtracting the post-burn-in sample mean and dividing each coordinate by its post-burn-in sample standard deviation. For all experiments, we used only the odd indices of the post burn-in sample points when thinning to form Sn. The paper describes preprocessing and selecting points (burn-in, thinning), but not explicit training/validation/test splits as typically understood for model training.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory used for running its experiments. Mentions of "thousands of CPU hours" refer to the problem domain (computational cardiology), not the authors' experimental setup.
Software Dependencies No See App. P for supplementary details and https://github.com/microsoft/goodpoints for a Python implementation of kernel thinning and code replicating each vignette. The paper mentions Python as the programming language used but does not provide specific version numbers for Python or any key libraries or dependencies.
Experiment Setup Yes To output a coreset of size n 1 2 with n input points, we (a) take every n 1 2 -th point for standard thinning and (b) run kernel thinning (KT) with m = 1 2 log2 n using a standard thinning coreset as the base coreset in kt-swap. For each input sample size n 24, 26, . . . , 214, 216 with δi = 1 2n, we report the mean coreset error MMDk (P, S) 1 standard error across 10 independent replications of the experiment (the standard errors are too small to be visible in all experiments). The parameter σ for the Gaussian kernel is set to the median distance between all pairs of 47 points obtained by standard thinning the post-burn-in odd indices. The selected burn-in periods for the Goodwin task were 820,000 for RW; 824,000 for ada RW; 1,615,000 for MALA; and 1,475,000 for p MALA. The respective numbers for the Lotka-Volterra task were 1,512,000 for RW; 1,797,000 for ada RW; 1,573,000 for MALA; and 1,251,000 for p MALA.