reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Automatic Differentiation Variational Inference

Authors: Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, David M. Blei

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study advi across ten modern probabilistic models and apply it to a dataset with millions of observations. ... Section 3 studies the properties of advi. We explore its accuracy, its stochastic nature, and its sensitivity to transformations. Section 4 applies advi to an array of probability models. We compare its speed to mcmc sampling techniques and present a case study using a dataset with millions of observations.
Researcher Affiliation	Academia	Alp Kucukelbir EMAIL Data Science Institute, Department of Computer Science Columbia University New York, NY 10027, USA; Dustin Tran EMAIL Department of Computer Science Columbia University New York, NY 10027, USA; Rajesh Ranganath EMAIL Department of Computer Science Princeton University Princeton, NJ 08540, USA; Andrew Gelman EMAIL Data Science Institute, Departments of Political Science and Statistics Columbia University New York, NY 10027, USA; David M. Blei EMAIL Data Science Institute, Departments of Computer Science and Statistics Columbia University New York, NY 10027, USA
Pseudocode	Yes	Algorithm 1: Automatic diﬀerentiation variational inference (advi)
Open Source Code	Yes	We implement and deploy advi as part of Stan, a probabilistic programming system (Stan Development Team, 2016). ... Appendix E. Running advi in Stan. Visit http://mc-stan.org/ to download the latest version of Stan. Follow instructions on how to install Stan.
Open Datasets	Yes	A dataset of trajectories is publicly available: it contains all 1.7 million taxi rides taken during the year 2014 (European Conference of Machine Learning, 2015). ... We use the Frey Faces dataset, which contains 1956 frames (28 20 pixels) of facial expressions extracted from a video sequence. ... We explore the imageclef dataset, which has 250 000 images (Villegas et al., 2013). ... a polling dataset from the United States 1988 presidential election (Gelman and Hill, 2006).
Dataset Splits	Yes	Linear regression with ard...We use 10 000 data points for training and withhold 1000 for evaluation. ... Logistic regression with a spatial hierarchical prior...We use 10 000 data points for training and withhold 1536 for evaluation. ... Gaussian Mixture Model...We withhold 10 000 images for evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions the general speed of the methods but not the underlying hardware.
Software Dependencies	Yes	We implement and deploy advi as part of Stan, a probabilistic programming system (Stan Development Team, 2016). ... The first is in Py MC3 (Salvatier et al., 2016), a probabilistic programming package, that implements advi in Python using Theano. The second is in Edward (Tran et al., 2016a), a Python library for probabilistic modeling, inference, and criticism, that implements advi in Python using Tensor Flow.
Experiment Setup	Yes	A single sample suffices. (We set M = 1 from here on.) ... The results in Figure 10a use a0 = b0 = c0 = d0 = 1 as hyper-parameters for the Gamma priors. ... The regression coeﬃcient β has a Normal(0, 10) prior and all standard deviation latent variables have half Normal(0, 10) priors. ... We set K = 10 and all the Gamma hyper-parameters to 1 in our experiments. ... We set K = 10, α0 = 1000 for each component, and λ0 = 0.1. ... With a minibatch size of 500 or larger, advi reaches high predictive accuracy. ... We set ϵ = 10 16, a small value that guarantees that the step-size sequence satisﬁes the Robbins and Monro (1951) conditions. The weighting factor α (0, 1) deﬁnes a compromise of old and new gradient information, which we set to 0.1. ... we set τ = 1. ... We adaptively tune η by searching over η {0.01, 0.1, 1, 10, 100} using a subset of the data and selecting the value that leads to the fastest convergence (Bottou, 2012).