reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Structured Uncertainty in the Observation Space of Variational Autoencoders

Authors: James Langley, Miguel Monteiro, Charles Jones, Nick Pawlowski, Ben Glocker

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform the comparison in two datasets: the CELEBA dataset (Liu et al., 2015) and the UK Biobank (UKBB) Brain Imaging dataset (Miller et al., 2016). For all models, we use a latent space of dimension 128. For the low-rank model, we use a rank of 25. [...] Quantitative evaluation of generative modelling is an inherently diﬃcult task due to its subjective nature. [...] The Fréchet Inception Distance (FID) (Heusel et al., 2017) is the current standard choice of metric due to its consistency with human perception (Borji, 2019). We use it to evaluate our generative models (Seitzer, 2020) and report the results in Table 1.
Researcher Affiliation	Collaboration	James Langley EMAIL Department of Computing, Imperial College London, UK Miguel Monteiro EMAIL Department of Computing, Imperial College London, UK Charles Jones EMAIL Department of Computing, Imperial College London, UK Nick Pawlowski EMAIL Microsoft Research, Cambridge, UK Department of Computing, Imperial College London, UK Ben Glocker EMAIL Department of Computing, Imperial College London, UK
Pseudocode	No	The paper describes methods and equations for the SOS-VAE, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code is made publicly available on https://github.com/biomedia-mira/sos-vae together with clear instructions how to fully reproduce our results on CELEBA to facilitate future work and ease of comparisons.
Open Datasets	Yes	We perform the comparison in two datasets: the CELEBA dataset (Liu et al., 2015) and the UK Biobank (UKBB) Brain Imaging dataset (Miller et al., 2016).
Dataset Splits	No	The paper mentions using the CELEBA dataset (Liu et al., 2015) and the UK Biobank (UKBB) Brain Imaging dataset (Miller et al., 2016) but does not provide specific details on how these datasets were split into training, validation, and test sets.
Hardware Specification	Yes	In contrast, the SOS-VAE model used in our experiments is trained on a single 16-GB T4 GPU in 72 hours, which is a substantial diﬀerence of practical importance.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers for its implementation. While PyTorch is mentioned in the context of a tool for FID calculation, it does not specify the PyTorch version used for the main model implementation.
Experiment Setup	Yes	For all models, we use a latent space of dimension 128. For the low-rank model, we use a rank of 25. For the CELEBA dataset we use a target KL loss , ξKL, of 45 for both models and ξH = 504750 for our model. For the UKBB dataset we use a target KL loss, ξKL, of 15 for both models and ξH = 198906 for our model.