reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pathfinder: Parallel quasi-Newton variational inference

Authors: Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Pathﬁnder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathﬁnder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors.
Researcher Affiliation	Collaboration	Lu Zhang EMAIL Division of Biostatistics Department of Population and Public Health Sciences University of Southern California Los Angeles, CA 90032, USA Bob Carpenter EMAIL Center for Computational Mathematics Flatiron Institute 162 5th Ave, New York, NY 10010, USA Andrew Gelman EMAIL Departments of Statistics and Political Science Columbia University New York, NY 10027, USA Aki Vehtari EMAIL Department of Computer Science Aalto University 00076 Aalto, Finland
Pseudocode	Yes	Algorithm 1 Single-path Pathﬁnder... Algorithm 2 Multi-path Pathﬁnder... Algorithm 3 Diagonal inverse Hessian estimation... Algorithm 4 Sample from local approximations... Algorithm 5 Pareto-smoothed importance resampling (PS-IR)... Algorithm 6 ELBO estimation... Algorithm 7 L-BFGS.
Open Source Code	Yes	The code for simulations is available at https://github.com/Lu Zhangstat/Pathfinder.
Open Datasets	Yes	Over a diverse set of 20 models from the posteriordb evaluation set (Magnusson et al., 2021), we found Pathﬁnder s approximations ranged from slightly worse to much better than those of ADVI using diagonal covariance (mean ﬁeld), ADVI with dense covariance (full rank), and dynamic HMC using short chains (75 iterations).
Dataset Splits	No	The paper uses the posteriordb evaluation set, which provides models and reference posteriors, but it does not specify explicit training, validation, or test splits for the data within these models. The experiments primarily involve comparing generated approximate posterior samples against provided reference posterior samples, rather than training models on partitioned datasets.
Hardware Specification	No	The paper mentions parallelization using 'multiple cores' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies	Yes	We use Stan s implementation of ADVI and dynamic HMC (Stan Development Team, 2021a). We use the L-BFGS-B implementation in the R function stats::optim() (R Core Team, 2021). We use the function wasserstein() from the R package transport (Schuhmacher et al., 2020) to calculate the 1-Wasserstein distance between two sets of draws. SciPy 1.0: Fundamental algorithms for scientiﬁc computing in Python. Nature Methods, 17(3):261 272, 2020.
Experiment Setup	Yes	For each model in posteriordb, we run single-path Pathﬁnder with 100 different random initializations, using our proposed default settings: maximum L-BFGS iterations (Lmax = 1000), relative tolerance for L-BFGS convergence (τrel = 10−13), size of L-BFGS history to approximate inverse Hessian (J = 6), number of Monte Carlo draws to evaluate ELBO (K = 5), and number of draws per run (M = 100). For multi-path Pathﬁnder, we again take 100 approximate draws, but use a larger number of intermediate runs, number of single-path Pathﬁnder runs (I = 20), number of draws returned by each single-path Pathﬁnder run (M = 100), and number of draws per run (R = 100). For Stan phase I adaptation: adaptive Hamiltonian Monte Carlo with Stan’s no-U-turn sampler (unit metric, step size adaptation, and a maximum tree depth of 10, keeping the last of 75 iterations).