reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

U-Statistics for Importance-Weighted Variational Inference

Authors: Javier Burroni, Kenta Takatsu, Justin Domke, Daniel Sheldon

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁnd empirically that U-statistic variance reduction can lead to modest to signiﬁcant improvements in inference performance on a range of models, with little computational cost. We demonstrate on a diverse set of inference problems that U-statistic-based variance reduction for the IW-ELBO either does not change, or leads to modest to signiﬁcant gains in black-box VI performance, with no substantive downsides. We empirically show that U-statistic-based estimators also reduce variance during IWAE training and lead to models with higher training objective values when used with either the standard gradient estimator or the doubly-reparameterized gradient (DRe G) estimator (Tucker et al., 2018). For black-box IWVI, we experiment with two kinds of models: Bayesian logistic regression with 5 diﬀerent UCI datasets (Dua & Graﬀ, 2017) using both diagonal and full covariance Gaussian variational distributions, and a suite of 12 statistical models from the Stan example models (Stan Development Team, 2021; Carpenter et al., 2017)
Researcher Affiliation	Academia	Javier Burroni EMAIL University of Massachusetts Amherst Kenta Takatsu EMAIL Carnegie Mellon University Justin Domke EMAIL University of Massachusetts Amherst Daniel Sheldon EMAIL University of Massachusetts Amherst
Pseudocode	No	The paper defines estimators (Estimator 1, Estimator 2, etc.) and theoretical propositions, but it does not include any clearly labeled pseudocode blocks or algorithms in a structured, code-like format.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository. It mentions using PyTorch and Pyro's DRe G implementation but does not offer its own code.
Open Datasets	Yes	For black-box IWVI, we experiment with two kinds of models: Bayesian logistic regression with 5 diﬀerent UCI datasets (Dua & Graﬀ, 2017) using both diagonal and full covariance Gaussian variational distributions, and a suite of 12 statistical models from the Stan example models (Stan Development Team, 2021; Carpenter et al., 2017). To evaluate the performance of the proposed methods on IWAEs, we trained IWAEs on 4 diﬀerent datasets: MNIST, KMNIST, FMNIST, and Omniglot.
Dataset Splits	Yes	MNIST 784 60000 + 10000 Le Cun et al. (2010) FMNIST 784 60000 + 10000 Fashion-MNIST, Xiao et al. (2017) KMNIST 784 60000 + 10000 Kuzushiji-MNIST Clanuwat et al. (2018) Omniglot 784 24345 + 8070 Lake et al. (2015) from Burda et al. (2016)
Hardware Specification	No	To get consistent wall-clock time measurements, we trained only using CPU on dedicated servers, with disabled hyper-threading and a single task per core. This statement describes the general environment but lacks specific CPU models, clock speeds, or other detailed hardware specifications.
Software Dependencies	No	We used SGD with 15 diﬀerent learning rates... We used the reparameterization gradient estimator as the base gradient estimator, and also provide in Appendix D and G (very similar) results for the doubly-reparameterized (DRe G) gradient estimator. For a randomly-sampled Dirichlet Distribution with 50 parameters, we approximate it using a (50 1)-dimensional Gaussian distribution parameterized with a full rank covariance matrix, with its domain constrained to the simplex using Py Torch s distributions (Paszke et al., 2019). We trained each combination of dataset, method, and value of m using ﬁve diﬀerent random seeds, and the optimization was run for 100 epochs using Adam (Kingma & Ba, 2015). Our implementation of DRe G is based on Pyro s (Bingham et al., 2018) not-yet-integrated implementation.
Experiment Setup	Yes	For each model, the variational parameters were optimized using stochastic gradient descent with ﬁxed learning rate for 15 diﬀerent logarithmically spaced learning rates. We used n = 16 samples per iteration except for the running time analysis, and experimented with m {2, 4, 8}. To evaluate the performance of the proposed methods on IWAEs, we trained IWAEs on 4 diﬀerent datasets: MNIST, KMNIST, FMNIST, and Omniglot. We compare the standard IW-ELBO estimator and DRe G estimators to their permuted versions, i.e., the permuted and permuted-DRe G estimators. We also evaluate the secondorder approximation to the complete-U-statistic estimator. We trained each combination of dataset, method, and value of m using ﬁve diﬀerent random seeds, and the optimization was run for 100 epochs using Adam (Kingma & Ba, 2015). In all cases, we used a batch size of 500, and a latent variable of dimension 50, while taking n = 50 samples. Datasets were taken from Py Torch, except for the Omniglot, for which we used the construction provided by Burda et al. (2016). We evaluated using the standard IW-ELBO estimator, regardless of the estimator used for the optimization. ... with a learning rate of 10^-4.