Enhanced gradient-based MCMC in discrete spaces

Authors: Benjamin Rhodes, Michael U. Gutmann

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the newly proposed methods NCG, AVG & PAVG on four problem types: 1) sampling from highly correlated ordinal mixture distributions 2) a sparse Bayesian variable selection problem 3) estimation of Ising models and 4) sampling a deep energy-based model parameterised by a convolutional neural network. Our key baselines are Gibbs-with-Gradients (GWG) Grathwohl et al. (2021) and a standard Gibbs sampler (Geman & Geman, 1984).
Researcher Affiliation Academia Benjamin Rhodes EMAIL University of Edinburgh Michael U. Gutmann EMAIL University of Edinburgh
Pseudocode Yes Algorithm B.1 NCG step; Algorithm B.2 AVG step; Algorithm B.3 PAVG step; Algorithm F.4 Adaptive learning of preconditioning matrix. (Default values in brackets are used across all experiments); Algorithm F.5 Adapt γ to maximise jump distance st st 1 1; Algorithm L.6 Persistent contrastive divergence with buffer
Open Source Code No This vectorised Py Torch code will be accessible upon publication.
Open Datasets Yes We apply this methodology to the USPS 256-dimensional image dataset of binarised handwritten digits (Hull, 1994)
Dataset Splits No The paper does not provide specific dataset splits like train/test/validation percentages or counts for the input data. It describes how MCMC chains are run (e.g., "Run 100 parallel chains for 10 minutes with a burn-in period of 1 minute") and the size of a generated dataset (e.g., "The dataset D consists of 10,000 samples"), but these are not splits of an original dataset for model training/evaluation.
Hardware Specification No The paper mentions "efficient GPU acceleration" in Appendix B but does not provide specific details on the GPU models or any other hardware specifications used for the experiments.
Software Dependencies Yes tfp.mcmc.effective_sample_size(S, filter_beyond_positive_pairs=True) using version 0.14.1 of tensorflow-probability. We use version 0.9.8 of the igraph package.
Experiment Setup Yes Our grid-search based tuning procedure involves running each sampler for a short amount of time (1000 iterations, which takes 1 minute in most of our experiments) with different step-sizes, and selecting the step-size that maximises the average L1-distance st+1 st 1 between successive states (averaged over all time-steps and parallel chains). For NCG, AVG & PAVG we first first identify the best order-of-magnitude by searching, in parallel, over the 5 values in the set {0.05, 0.5, 5.0, 50.0, 500.0}. We set Niters = 2,000, Nbatch = 50, Nbuffer = 5000, ϵ = 0.0003. We use weight decay of 0.0001 on the neural net weights.