Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Authors: Wei Guo, Molei Tao, Yongxin Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct simple numerical experiments to verify the findings in Ex. 2, which demonstrate that for a certain class of mixture of Gaussian distributions, ALMC achieves polynomial convergence with respect to r. Specifically, we consider a target distribution comprising a mixture of 6 Gaussians with uniform weights 1/6, means r cos kπ/3 , r sin kπ/3 : k [[0, 5]] , and the same covariance 0.1I (corresponding to β = 10 in Ex. 2). We experimented r {2, 5, 10, 15, 20, 25, 30}, and compute the number of iterations required for the empirical KL divergence from the target distribution to the sampled distribution to fall below 0.2 (blue curve) and 0.1 (orange curve). The results, displayed in Fig. 2, use a log-log scale for both axes. The near-linear behavior of the curves in this plot confirms that the complexity depends polynomially on r, validating our theory.
Researcher Affiliation Academia Wei Guo, Molei Tao, Yongxin Chen Georgia Institute of Technology EMAIL
Pseudocode Yes Algorithm 1: Annealed LMC Algorithm
Open Source Code No The paper does not provide any concrete access to source code for the methodology described. It mentions third-party tools like 'Information Theoretical Estimators (ITE) toolbox' and 'sklearn.linear_model.LinearRegression' but does not provide code for its own implementation.
Open Datasets No Specifically, we consider a target distribution comprising a mixture of 6 Gaussians with uniform weights 1/6, means r cos kπ/3 , r sin kπ/3 : k [[0, 5]] , and the same covariance 0.1I (corresponding to β = 10 in Ex. 2). We experimented r {2, 5, 10, 15, 20, 25, 30}...
Dataset Splits No The paper describes generating a synthetic mixture of Gaussian distributions for experiments but does not mention any training, validation, or test splits for this data.
Hardware Specification No The paper mentions 'We conduct simple numerical experiments' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for these experiments.
Software Dependencies No We use the Information Theoretical Estimators (ITE) toolbox (Szab o, 2014) to empirically estimate the KL divergence. In all cases, we generate 1000 samples using ALMC, and an additional 1000 samples from the target distribution. The KL divergence is estimated using the ite.cost.BDKL Knn K() function, which leverages k-nearest-neighbor techniques to compute the divergence from the target distribution to the sampled distribution. We also compute the linear regression approximation of the curves in Fig. 2 via sklearn.linear_model.LinearRegression.
Experiment Setup Yes For all the experiments, we use the annealing schedule λ(θ) = 5(1 θ)10 and η(θ) 1. The step size for ALMC is designed to follow a quadratic schedule: the step size at the ℓ-th iteration (out of M total iterations, ℓ [[1, M]]) is given by smax smin 2 2 + smax, which increases on 0, M 2 and decreases on M 2 , M , with a maximum of smax and a minimum of smin. For r = 2, 5, 10, 15, 20, 25, 30, the M we choose are 200, 500, 2500, 10000, 20000, 40000, 60000, respectively, and we set smax = 0.05 and smin = 0.01, which achieve the best performance across all settings after tuning.