Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data
Authors: Vaidotas Simkus, Benjamin Rhodes, Michael U. Gutmann
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as variational autoencoders and normalising flows from incomplete data. |
| Researcher Affiliation | Academia | Vaidotas Simkus EMAIL Benjamin Rhodes EMAIL Michael U. Gutmann EMAIL School of Informatics University of Edinburgh |
| Pseudocode | Yes | Algorithm 1 Variational Gibbs inference (VGI) algorithm Input: pθ(x), statistical model with parameters θ qφj(xj | x j) for j {1 . . . d}, variational conditional models with parameters φ D, incomplete data set K, number of imputations of each incomplete data-point f0(xmis | xobs), initial imputation distribution αθ and αφ, the parameter learning rates max epochs, number of epochs Output: θ, φ, and K-times imputed data DK |
| Open Source Code | Yes | An interactive demo is available at github.com/vsimkus/variational-gibbs-inference. The code for this and the following experimental sections is available at https://github.com/vsimkus/ variational-gibbs-inference. |
| Open Datasets | Yes | The original data set is available at https://cs.nyu.edu/~roweis/data/frey_rawface.mat. We evaluate VGI on a selection of tabular data sets from the UCI machine-learning repository (Dua and Graff, 2017), which are commonly used to evaluate normalising flow models |
| Dataset Splits | Yes | The training data set has 6400 data-points and the test data set has 5000 data-points. The training data set has 2400 data-points and the test data set has 3000 data-points. We consider five fractions of missingness in the training data, ranging from 16.6% to 83.3%, and simulate incomplete training data by generating a binary missingness mask uniformly at random (MCAR). |
| Hardware Specification | No | The paper mentions 'on CPU' in the context of computation time for some methods, but does not provide specific CPU models, GPU models, or any detailed hardware specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software like Python, scikit-learn, Adam optimiser (Kingma and Ba, 2014), and AMSGrad (Reddi et al., 2018), but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use K = 5 imputation chains for each incomplete data-point, and G = 3 (toy-data) and G = 5 (FA-Frey) Gibbs updates. In the Monte Carlo averaging in ˆ JVGI we select M = 1 (toy-data) and M = 10 (FA-Frey) missing dimensions. The learning rate was decayed according to a cosine schedule. We use the Adam optimiser, whereas the variational parameters φ are fitted using AMSGrad. The shared network consists of one residual block with 256 hidden features, and outputs a 128-dimensional shared representation. Each elementwise transformation network consists of 2 residual blocks with 32 hidden features. |