Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients

Authors: Yuansi Chen, Raaz Dwivedi, Martin J. Wainwright, Bin Yu

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we numerically compare HMC with MALA and MRW to verify that our suggested step-size and leapfrog steps lead to faster convergence for the HMC algorithm. We simulate 100 independent runs of the four chains, MRW, MALA, HMC, HMCagg, and for each chain at every iteration we compute the quantile error across the 100 samples from 100 independent runs of that chain. We compute the minimum number of total function and gradient evaluations required for the relative quantile error to fall below δ = 0.04. We repeat this computation 10 times and report the averaged number of total function and gradient evaluations in Figure 2.
Researcher Affiliation Academia Yuansi Chen EMAIL Department of Statistics University of California Berkeley, CA 94720-1776, USA. Raaz Dwivedi EMAIL Department of Electrical Engineering and Computer Sciences University of California Berkeley, CA 94720-1776, USA. Martin J. Wainwright EMAIL Department of Statistics Department of Electrical Engineering and Computer Sciences University of California Berkeley, CA 94720-1776, USA. Bin Yu EMAIL Department of Statistics Department of Electrical Engineering and Computer Sciences University of California Berkeley, CA 94720-1776, USA
Pseudocode Yes Algorithm 1: Metropolized Random Walk (MRW) Algorithm 2: Metropolis adjusted Langevin algorithm (MALA) Algorithm 3: Metropolized HMC with leapfrog integrator
Open Source Code No The paper does not provide any concrete access information for source code. It mentions general software packages that use HMC (Stan, Mamba, Tensorflow) in the introduction, but not a release of the authors' own implementation.
Open Datasets Yes In this simulation, we check the dimension d dependency and condition number κ dependency in the multivariate Gaussian case under our step-size choices. We consider sampling from the multivariate Gaussian distribution with density π (x) ∝ exp − 1 2 xΣ−1x, (24) for some covariance matrix Σ ∈ Rd×d. ... The Hessian Σ in the multivariate Gaussian distribution is chosen to be diagonal and the square roots of its eigenvalues are linearly spaced between 1.0 to 2.0.
Dataset Splits No The paper focuses on sampling from analytically defined distributions and evaluates mixing times using numerical simulations. It does not involve traditional datasets with training, validation, or test splits typical of supervised learning tasks.
Hardware Specification No The paper describes numerical experiments in Section 4 but does not specify any hardware details such as GPU models, CPU types, or memory configurations used for these experiments.
Software Dependencies No The paper mentions software packages like Stan, Mamba, and TensorFlow as examples of HMC use, but it does not specify any particular software libraries, frameworks, or their version numbers that were used by the authors for their own experimental work.
Experiment Setup Yes For every case of simulation, the parameters for HMC-(K, η) are chosen according to the warm start case in Corollary 2 with K = 4 d1/4, and for MRW and MALA are chosen according to the paper Dwivedi et al. (2019). As alluded to earlier, we also run the HMC chain a more aggressive choice of parameters, and denote this chain by HMCagg. For HMCagg, both the step-size and leapfrog steps are larger (Appendix D.1.1): K = 4 d1/8κ1/4 where we take into account that LH is zero for Gaussian distribution. We simulate 100 independent runs of the four chains... We compute the minimum number of total function and gradient evaluations required for the relative quantile error to fall below δ = 0.04. We repeat this computation 10 times.