PAVI: Plate-Amortized Variational Inference
Authors: Louis Rouillard, Alexandre Le Bris, Thomas Moreau, Demian Wassermann
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the practical utility of PAVI through a challenging Neuroimaging example featuring 400 million latent parameters, demonstrating a significant step towards scalable and expressive Variational Inference. In this section, we show how PAVI unlocks hierarchical Bayesian model inference for large-scale problems by matching the inference quality of SOTA methods while providing faster convergence and lighter parameterization. Our experiments also highlight the differences between our two encoding schemes PAVI-E and PAVI-F. ELBO metric Throughout this section, we use the ELBO as a proxy for the KL divergence between the variational posterior and the unknown true posterior (Blei et al., 2017). ELBO is measured across 20 samples X, with 5 repetitions per sample. The ELBO allows us to compare the relative performance of different architectures on a given inference problem. |
| Researcher Affiliation | Academia | Louis Rouillard EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France Alexandre Le Bris EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France Thomas Moreau EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France Demian Wassermann EMAIL Université Paris-Saclay, Inria, CEA Palaiseau, 91120, France |
| Pseudocode | Yes | A.3 PAVI algorithms Algorithm 1: PAVI architecture build Algorithm 2: PAVI stochastic training Algorithm 3: PAVI inference |
| Open Source Code | Yes | As part of our submission we furthermore packaged and release the code associated to our experiments. |
| Open Datasets | Yes | We use the HCP dataset (Van Essen et al., 2012): 2 acquisitions from a 1000 subjects, with millions of measures per acquisition, and over 400 million parameters Θ to infer. |
| Dataset Splits | Yes | For the cognitive scores prediction in Figure 4, we reproduced the standard methodology from Kong et al. (2019). We perform a 20-fold cross-validation across 1,000 subjects. We start from the logitss,n associated with each subject. We use PCA to project the features from the 19 training folds to their 100 first components (with 33% explained variance). We then train a linear regression to predict each of the 13 cognitive scores from the training-fold PCA features. We compute the test performance on the test fold using the training set PCA and linear regression averaged across the 13 cognitive measures. |
| Hardware Specification | Yes | All experiments were conducted on computational cluster nodes equipped with a Tesla V100-16Gb GPU and 4 AMD EPYC 7742 64-Core processors. VRAM intensive experiments in Figure 3 were performed on an Ampere 100 PCIE-40Gb GPU. |
| Software Dependencies | Yes | All experiments were performed in Python using the Tensorflow Probability library (Dillon et al., 2017). We minimally pre-process the signal using the nilearn python library (Abraham et al., 2014) |
| Experiment Setup | Yes | All experiments are performed using the Adam optimizer (Kingma & Ba, 2015). At training, the ELBO was estimated using a Monte Carlo procedure with 8 samples. All 3 architectures (baseline, PAVI-F, PAVI-E) used: for the flows Fi, a MAF with [32, 32] hidden units; as encoding size, 128 For the encoder f in the PAVI-E scheme, we used a multi-head architecture with 4 heads of 32 units each, 2 ISAB blocks with 64 inducing points. |