Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data

Authors: Vaidotas Simkus, Benjamin Rhodes, Michael U. Gutmann

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as variational autoencoders and normalising flows from incomplete data.
Researcher Affiliation Academia Vaidotas Simkus EMAIL Benjamin Rhodes EMAIL Michael U. Gutmann EMAIL School of Informatics University of Edinburgh
Pseudocode Yes Algorithm 1 Variational Gibbs inference (VGI) algorithm Input: pθ(x), statistical model with parameters θ qφj(xj | x j) for j {1 . . . d}, variational conditional models with parameters φ D, incomplete data set K, number of imputations of each incomplete data-point f0(xmis | xobs), initial imputation distribution αθ and αφ, the parameter learning rates max epochs, number of epochs Output: θ, φ, and K-times imputed data DK
Open Source Code Yes An interactive demo is available at github.com/vsimkus/variational-gibbs-inference. The code for this and the following experimental sections is available at https://github.com/vsimkus/ variational-gibbs-inference.
Open Datasets Yes The original data set is available at https://cs.nyu.edu/~roweis/data/frey_rawface.mat. We evaluate VGI on a selection of tabular data sets from the UCI machine-learning repository (Dua and Graff, 2017), which are commonly used to evaluate normalising flow models
Dataset Splits Yes The training data set has 6400 data-points and the test data set has 5000 data-points. The training data set has 2400 data-points and the test data set has 3000 data-points. We consider five fractions of missingness in the training data, ranging from 16.6% to 83.3%, and simulate incomplete training data by generating a binary missingness mask uniformly at random (MCAR).
Hardware Specification No The paper mentions 'on CPU' in the context of computation time for some methods, but does not provide specific CPU models, GPU models, or any detailed hardware specifications used for running experiments.
Software Dependencies No The paper mentions software like Python, scikit-learn, Adam optimiser (Kingma and Ba, 2014), and AMSGrad (Reddi et al., 2018), but does not provide specific version numbers for these software components.
Experiment Setup Yes We use K = 5 imputation chains for each incomplete data-point, and G = 3 (toy-data) and G = 5 (FA-Frey) Gibbs updates. In the Monte Carlo averaging in ˆ JVGI we select M = 1 (toy-data) and M = 10 (FA-Frey) missing dimensions. The learning rate was decayed according to a cosine schedule. We use the Adam optimiser, whereas the variational parameters φ are fitted using AMSGrad. The shared network consists of one residual block with 256 hidden features, and outputs a 128-dimensional shared representation. Each elementwise transformation network consists of 2 residual blocks with 32 hidden features.