PAVI: Plate-Amortized Variational Inference

Authors: Louis Rouillard, Alexandre Le Bris, Thomas Moreau, Demian Wassermann

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the practical utility of PAVI through a challenging Neuroimaging example featuring 400 million latent parameters, demonstrating a significant step towards scalable and expressive Variational Inference. In this section, we show how PAVI unlocks hierarchical Bayesian model inference for large-scale problems by matching the inference quality of SOTA methods while providing faster convergence and lighter parameterization. Our experiments also highlight the differences between our two encoding schemes PAVI-E and PAVI-F. ELBO metric Throughout this section, we use the ELBO as a proxy for the KL divergence between the variational posterior and the unknown true posterior (Blei et al., 2017). ELBO is measured across 20 samples X, with 5 repetitions per sample. The ELBO allows us to compare the relative performance of different architectures on a given inference problem.
Researcher Affiliation Academia Louis Rouillard EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France Alexandre Le Bris EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France Thomas Moreau EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France Demian Wassermann EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France
Pseudocode Yes A.3 PAVI algorithms Algorithm 1: PAVI architecture build Algorithm 2: PAVI stochastic training Algorithm 3: PAVI inference
Open Source Code Yes As part of our submission we furthermore packaged and release the code associated to our experiments.
Open Datasets Yes We use the HCP dataset (Van Essen et al., 2012): 2 acquisitions from a 1000 subjects, with millions of measures per acquisition, and over 400 million parameters Θ to infer.
Dataset Splits Yes For the cognitive scores prediction in Figure 4, we reproduced the standard methodology from Kong et al. (2019). We perform a 20-fold cross-validation across 1,000 subjects. We start from the logitss,n associated with each subject. We use PCA to project the features from the 19 training folds to their 100 first components (with 33% explained variance). We then train a linear regression to predict each of the 13 cognitive scores from the training-fold PCA features. We compute the test performance on the test fold using the training set PCA and linear regression averaged across the 13 cognitive measures.
Hardware Specification Yes All experiments were conducted on computational cluster nodes equipped with a Tesla V100-16Gb GPU and 4 AMD EPYC 7742 64-Core processors. VRAM intensive experiments in Figure 3 were performed on an Ampere 100 PCIE-40Gb GPU.
Software Dependencies Yes All experiments were performed in Python using the Tensorflow Probability library (Dillon et al., 2017). We minimally pre-process the signal using the nilearn python library (Abraham et al., 2014)
Experiment Setup Yes All experiments are performed using the Adam optimizer (Kingma & Ba, 2015). At training, the ELBO was estimated using a Monte Carlo procedure with 8 samples. All 3 architectures (baseline, PAVI-F, PAVI-E) used: for the flows Fi, a MAF with [32, 32] hidden units; as encoding size, 128 For the encoder f in the PAVI-E scheme, we used a multi-head architecture with 4 heads of 32 units each, 2 ISAB blocks with 64 inducing points.