Pathfinder: Parallel quasi-Newton variational inference
Authors: Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. |
| Researcher Affiliation | Collaboration | Lu Zhang EMAIL Division of Biostatistics Department of Population and Public Health Sciences University of Southern California Los Angeles, CA 90032, USA Bob Carpenter EMAIL Center for Computational Mathematics Flatiron Institute 162 5th Ave, New York, NY 10010, USA Andrew Gelman EMAIL Departments of Statistics and Political Science Columbia University New York, NY 10027, USA Aki Vehtari EMAIL Department of Computer Science Aalto University 00076 Aalto, Finland |
| Pseudocode | Yes | Algorithm 1 Single-path Pathfinder... Algorithm 2 Multi-path Pathfinder... Algorithm 3 Diagonal inverse Hessian estimation... Algorithm 4 Sample from local approximations... Algorithm 5 Pareto-smoothed importance resampling (PS-IR)... Algorithm 6 ELBO estimation... Algorithm 7 L-BFGS. |
| Open Source Code | Yes | The code for simulations is available at https://github.com/Lu Zhangstat/Pathfinder. |
| Open Datasets | Yes | Over a diverse set of 20 models from the posteriordb evaluation set (Magnusson et al., 2021), we found Pathfinder s approximations ranged from slightly worse to much better than those of ADVI using diagonal covariance (mean field), ADVI with dense covariance (full rank), and dynamic HMC using short chains (75 iterations). |
| Dataset Splits | No | The paper uses the posteriordb evaluation set, which provides models and reference posteriors, but it does not specify explicit training, validation, or test splits for the data within these models. The experiments primarily involve comparing generated approximate posterior samples against provided reference posterior samples, rather than training models on partitioned datasets. |
| Hardware Specification | No | The paper mentions parallelization using 'multiple cores' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | Yes | We use Stan s implementation of ADVI and dynamic HMC (Stan Development Team, 2021a). We use the L-BFGS-B implementation in the R function stats::optim() (R Core Team, 2021). We use the function wasserstein() from the R package transport (Schuhmacher et al., 2020) to calculate the 1-Wasserstein distance between two sets of draws. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17(3):261 272, 2020. |
| Experiment Setup | Yes | For each model in posteriordb, we run single-path Pathfinder with 100 different random initializations, using our proposed default settings: maximum L-BFGS iterations (Lmax = 1000), relative tolerance for L-BFGS convergence (τrel = 10−13), size of L-BFGS history to approximate inverse Hessian (J = 6), number of Monte Carlo draws to evaluate ELBO (K = 5), and number of draws per run (M = 100). For multi-path Pathfinder, we again take 100 approximate draws, but use a larger number of intermediate runs, number of single-path Pathfinder runs (I = 20), number of draws returned by each single-path Pathfinder run (M = 100), and number of draws per run (R = 100). For Stan phase I adaptation: adaptive Hamiltonian Monte Carlo with Stan’s no-U-turn sampler (unit metric, step size adaptation, and a maximum tree depth of 10, keeping the last of 75 iterations). |