reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pareto Smoothed Importance Sampling

Authors: Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao, Jonah Gabry

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a new method for stabilizing importance weights using a generalized Pareto distribution ﬁt to the upper tail of the distribution of the simulated importance ratios. The method, which empirically performs better than existing methods for stabilizing importance sampling estimates, includes stabilized eﬀective sample size estimates, Monte Carlo error estimates, and convergence diagnostics. The presented Pareto ˆk ﬁnite sample convergence rate diagnostic is useful for any Monte Carlo estimator.
Researcher Affiliation	Collaboration	Aki Vehtari EMAIL Department of Computer Science Aalto University; Daniel Simpson EMAIL Normal Computing; Andrew Gelman EMAIL Departments of Statistics and Political Science Columbia University; Yuling Yao EMAIL Center for Computational Mathematics Flatiron Institute; Jonah Gabry EMAIL Department of Statistics Columbia University.
Pseudocode	Yes	Algorithm 1: PSIS procedure for computing importance weights.
Open Source Code	No	The text mentions that PSIS forms the basis for the widely-used loo R package (Vehtari et al., 2017, 2024), and is implemented in the posterior R package (B urkner et al., 2024), Arvi Z.py and Arviz.jl (Kumar et al., 2019), and Pyro Python package (Bingham et al., 2019). It also provides a link for 'GPstuﬀ toolbox' code at https://github.com/gpstuﬀ-dev/gpstuﬀ, but this is a tool used in an example, not the specific code implementation of the methods described in this paper by the authors. The paper does not provide an explicit statement from the authors that their code for this paper's methodology is open-source or a direct link to their repository.
Open Datasets	Yes	We repeat the density estimation using the Galaxy data set6 1000 times with diﬀerent random seeds. The Galaxy dataset is referenced with a URL: https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/galaxies.html. The breast cancer tumor data set (Section 5.2.2) is also stated as publicly available: The data were published by Aure et al. (2015) and are publicly available; we used the preprocessed data as described by Aittom aki (2016).
Dataset Splits	Yes	Importance-sampling Leave-one-out Cross-validation... For this example, there were 352 such cases for the Gaussian models and 53 for the Student-t models, and the computation for these took 42 hours. Although combining PSIS-LOO with exact LOO for certain points substantially increases the computation time in this example, it is still less than the time required for 10-fold-CV.
Hardware Specification	Yes	Computation time for MCMC inference was about half an hour and computation time for split-normal with importance sampling was about 1.3 s (laptop with Intel Core i5-4300U CPU @ 1.90GHz x 4)... For 4000 posterior draws, the computation for one gene and one model took about 9 minutes (desktop Intel Xeon CPU E3-1231 v3 @ 3.40GHz x 8), which is reasonable speed.
Software Dependencies	Yes	The PSIS and Pareto ˆk diagnostic have been also implemented, for example, in the posterior R package (B urkner et al., 2024)... This paper focuses on self-normalized importance sampling, but Pareto ˆk diagnostic and Pareto smoothing can be used also for ordinary importance sampling or any Monte Carlo estimate, as demonstrated in some of the references discussed in Section 6. Beyond the examples in the latter part of this paper, PSIS forms the basis for the widely-used loo R package for stable, high-dimensional leave-one-out cross-validation (Vehtari et al., 2017, 2024)... The Stan Modeling Language: User's Guide and Reference Manual, 2017. Version 2.16.0, https://mc-stan.org/. ...projpred: Projection predictive feature selection, 2023. URL https://mc-stan.org/projpred/. R package version 2.8.0. ... adjustr: Stan model adjustments and sensitivity analyses using importance sampling. R package version 0.1.2. Available online at: https://corymccartan.github.io/adjustr/ (accessed December 8, 2021), 2021.
Experiment Setup	Yes	The model has 400 latent values, that is, the posterior is 400-dimensional, although due to a strong dependency imposed by the Gaussian process prior the eﬀective dimensionality is smaller. Because of this, it is suﬃcient that the split-normal is scaled only along the ﬁrst 50 principal component axes... For 4000 posterior draws, the computation for one gene and one model took about 9 minutes... We assumed a multivariate linear model for the eﬀects with a Gaussian prior and used Stan (Stan Development Team, 2017) to ﬁt the model... data { int<lower=0> N; int<lower=0> p; vector[N] y; matrix[N,p] x; }... parameters { real beta0; vector[p] beta; real<lower=0> sigmasq; real<lower=0> phi; }... model { beta0 ~ normal(0, 100); phi ~ cauchy(0, sd_y); beta ~ normal(0, phi); sigmasq ~ inv_gamma(0.1, 0.1); y ~ normal(mu, sigma); }