Pathfinder: Parallel quasi-Newton variational inference

Authors: Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors.
Researcher Affiliation Collaboration Lu Zhang EMAIL Division of Biostatistics Department of Population and Public Health Sciences University of Southern California Los Angeles, CA 90032, USA Bob Carpenter EMAIL Center for Computational Mathematics Flatiron Institute 162 5th Ave, New York, NY 10010, USA Andrew Gelman EMAIL Departments of Statistics and Political Science Columbia University New York, NY 10027, USA Aki Vehtari EMAIL Department of Computer Science Aalto University 00076 Aalto, Finland
Pseudocode Yes Algorithm 1 Single-path Pathfinder... Algorithm 2 Multi-path Pathfinder... Algorithm 3 Diagonal inverse Hessian estimation... Algorithm 4 Sample from local approximations... Algorithm 5 Pareto-smoothed importance resampling (PS-IR)... Algorithm 6 ELBO estimation... Algorithm 7 L-BFGS.
Open Source Code Yes The code for simulations is available at https://github.com/Lu Zhangstat/Pathfinder.
Open Datasets Yes Over a diverse set of 20 models from the posteriordb evaluation set (Magnusson et al., 2021), we found Pathfinder s approximations ranged from slightly worse to much better than those of ADVI using diagonal covariance (mean field), ADVI with dense covariance (full rank), and dynamic HMC using short chains (75 iterations).
Dataset Splits No The paper uses the posteriordb evaluation set, which provides models and reference posteriors, but it does not specify explicit training, validation, or test splits for the data within these models. The experiments primarily involve comparing generated approximate posterior samples against provided reference posterior samples, rather than training models on partitioned datasets.
Hardware Specification No The paper mentions parallelization using 'multiple cores' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies Yes We use Stan s implementation of ADVI and dynamic HMC (Stan Development Team, 2021a). We use the L-BFGS-B implementation in the R function stats::optim() (R Core Team, 2021). We use the function wasserstein() from the R package transport (Schuhmacher et al., 2020) to calculate the 1-Wasserstein distance between two sets of draws. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17(3):261 272, 2020.
Experiment Setup Yes For each model in posteriordb, we run single-path Pathfinder with 100 different random initializations, using our proposed default settings: maximum L-BFGS iterations (Lmax = 1000), relative tolerance for L-BFGS convergence (τrel = 10−13), size of L-BFGS history to approximate inverse Hessian (J = 6), number of Monte Carlo draws to evaluate ELBO (K = 5), and number of draws per run (M = 100). For multi-path Pathfinder, we again take 100 approximate draws, but use a larger number of intermediate runs, number of single-path Pathfinder runs (I = 20), number of draws returned by each single-path Pathfinder run (M = 100), and number of draws per run (R = 100). For Stan phase I adaptation: adaptive Hamiltonian Monte Carlo with Stan’s no-U-turn sampler (unit metric, step size adaptation, and a maximum tree depth of 10, keeping the last of 75 iterations).