reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Framework for Improving the Reliability of Black-box Variational Inference

Authors: Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, Jonathan H. Huggins

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples. Keywords: black-box variational inference, symmetrized KL divergence, stochastic optimization, ﬁxed-learning rate
Researcher Affiliation	Academia	Manushi Welandawe EMAIL Department of Mathematics & Statistics Boston University, USA Michael Riis Andersen EMAIL DTU Compute Technical University of Denmark, Denmark Aki Vehtari EMAIL Department of Computer Science Aalto University, Finland Jonathan H. Huggins EMAIL Department of Mathematics & Statistics Faculty of Computing & Data Sciences Boston University, USA
Pseudocode	Yes	Algorithm 1: Fixed learning-rate automated stochastic optimization (FASO) ... Algorithm 2: Robust and automated black-box variational inference (RABVI)
Open Source Code	Yes	We make RABVI available as part of the open source Python package VIABEL.1 1. https://github.com/jhuggins/viabel
Open Datasets	Yes	To validate the robustness and reliability of RABVI across realistic use cases, we consider 18 diverse data set/model pairs found in the posteriordb package3 (see Appendix C for details). 3. https://github.com/stan-dev/posteriordb
Dataset Splits	No	The paper uses existing datasets (posteriordb) for evaluating inference algorithms but does not specify explicit training/test/validation splits for model development or evaluation within its own experiments. The evaluation focuses on approximating posterior distributions against ground-truth estimates.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions that RABVI is available as an 'open source Python package VIABEL' and refers to using 'Stan' for fitting regression models and obtaining ground truth. However, it does not provide specific version numbers for Python, Stan, or any other software libraries, which are necessary for reproducibility.
Experiment Setup	Yes	When γ is ﬁxed, our proposal from Section 5 is summarized in Algorithm 1, which we call ﬁxed learning-rate automated stochastic optimization (FASO). Combining the termination rule from Section 4 with FASO, we get our complete framework, robust and automated black-box variational inference (RABVI), which we summarize in Algorithm 2. We will verify the robustness of RABVI through numerical experiments. RABVI is automatic since the user is only required to provide a target distribution and the only tuning parameters we recommend changing from their defaults are deﬁned on interpretable, intuitive scales: accuracy threshold ξ: ... Our experiments suggest ξ = 0.1 is a good default value. ineﬃciency threshold τ: We recommend setting the ineﬃciency threshold τ = 1... maximum number of iterations Kmax: ... initial learning rate γ0: ... We use γ0 = 0.3 in all of our experiments. minimum window size Wmin: We recommend taking Wmin = 200... small iteration number K0: ... We use K0 = 5Wmin = 1000 for our experiments... initial iterate average relative error threshold ε0: ... we take ε0 = ξ by default. adaptation factor ρ: We recommend taking ρ = 0.5... Monte Carlo samples M: We ﬁnd that M = 10 provides a good balance... Unless stated otherwise, all experiments use avg Adam to compute the descent direction, mean-ﬁeld Gaussian distributions as the variational family, and the tuning parameters values recommended in Section 6. We ﬁt the regression model for C (and κ) in Stan, which result in extremely small computational overhead of less than 0.5%. We compare RABVI to FASO, Stan s ADVI implementation, SGD using an exponential decay of the learning rate, and ﬁxed learning rate versions of RMSProp, Adam, and a windowed version of Adagrad (WAdagrad), which is the default optimizer in Py MC3. Moreover, we compare RABVI with exponential decay and cosine learning rate schedules using Adam and RMSProp optimization methods. We run all the algorithms that do not have a termination criterion for Kmax = 100,000 iterations and for the ﬁxed learning-rate algorithms we use learning rate γ = 0.01 in an eﬀort to balance speed with accuracy. For exponential decay, we use a learning rate of γ = γ0δ k/s , where γ0 denotes the initial learning rate, δ denotes the decay rate, k denotes the iteration, and s denotes the decay step. We choose γ0 = 0.01, δ = 0.96, and s = 900 so that the ﬁnal learning rate is approximately 0.0001 (Chen et al., 2017). For cosine schedule, we use a learning rate of γ = γmin + 1/2(γmax γmin)(1 + cos(k/K π)), where γmin and γmax denote the minimum and maximum values of learning rate, k denotes the current iteration, and K denotes the maximum number of iterations (Loshchilov and Hutter, 2017). We choose γmin = 0.0001, γmax = 0.01 to make it comparable with other methods.