Enhanced gradient-based MCMC in discrete spaces
Authors: Benjamin Rhodes, Michael U. Gutmann
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the newly proposed methods NCG, AVG & PAVG on four problem types: 1) sampling from highly correlated ordinal mixture distributions 2) a sparse Bayesian variable selection problem 3) estimation of Ising models and 4) sampling a deep energy-based model parameterised by a convolutional neural network. Our key baselines are Gibbs-with-Gradients (GWG) Grathwohl et al. (2021) and a standard Gibbs sampler (Geman & Geman, 1984). |
| Researcher Affiliation | Academia | Benjamin Rhodes EMAIL University of Edinburgh Michael U. Gutmann EMAIL University of Edinburgh |
| Pseudocode | Yes | Algorithm B.1 NCG step; Algorithm B.2 AVG step; Algorithm B.3 PAVG step; Algorithm F.4 Adaptive learning of preconditioning matrix. (Default values in brackets are used across all experiments); Algorithm F.5 Adapt γ to maximise jump distance st st 1 1; Algorithm L.6 Persistent contrastive divergence with buffer |
| Open Source Code | No | This vectorised Py Torch code will be accessible upon publication. |
| Open Datasets | Yes | We apply this methodology to the USPS 256-dimensional image dataset of binarised handwritten digits (Hull, 1994) |
| Dataset Splits | No | The paper does not provide specific dataset splits like train/test/validation percentages or counts for the input data. It describes how MCMC chains are run (e.g., "Run 100 parallel chains for 10 minutes with a burn-in period of 1 minute") and the size of a generated dataset (e.g., "The dataset D consists of 10,000 samples"), but these are not splits of an original dataset for model training/evaluation. |
| Hardware Specification | No | The paper mentions "efficient GPU acceleration" in Appendix B but does not provide specific details on the GPU models or any other hardware specifications used for the experiments. |
| Software Dependencies | Yes | tfp.mcmc.effective_sample_size(S, filter_beyond_positive_pairs=True) using version 0.14.1 of tensorflow-probability. We use version 0.9.8 of the igraph package. |
| Experiment Setup | Yes | Our grid-search based tuning procedure involves running each sampler for a short amount of time (1000 iterations, which takes 1 minute in most of our experiments) with different step-sizes, and selecting the step-size that maximises the average L1-distance st+1 st 1 between successive states (averaged over all time-steps and parallel chains). For NCG, AVG & PAVG we first first identify the best order-of-magnitude by searching, in parallel, over the 5 values in the set {0.05, 0.5, 5.0, 50.0, 500.0}. We set Niters = 2,000, Nbatch = 50, Nbuffer = 5000, ϵ = 0.0003. We use weight decay of 0.0001 on the neural net weights. |