reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifiable Deep Generative Models via Sparse Decoding

Authors: Gemma Elyse Moran, Dhanya Sridhar, Yixin Wang, David Blei

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically study the sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.
Researcher Affiliation	Academia	Gemma E. Moran EMAIL Columbia University Dhanya Sridhar Mila Quebec AI Institute and Université de Montréal Yixin Wang University of Michigan David M. Blei Columbia University
Pseudocode	Yes	Algorithm 1: The sparse VAE
Open Source Code	Yes	The sparse VAE implementation may be found at https://github.com/gemoran/sparse-vae-code.
Open Datasets	Yes	Peer Read (Kang et al., 2018). Dataset of word counts for paper abstracts (N 10, 000, G = 500). Movie Lens (Harper and Konstan, 2015). Dataset of binary user-movie ratings (N = 100, 000, G = 300). Zeisel (Zeisel et al., 2015). Dataset of RNA molecule counts in mouse cortex cells (N = 3005, G = 558).
Dataset Splits	Yes	All results are averaged over five splits of the data, with standard deviation in parentheses. We assess this question using the semi-synthetic Peer Read dataset, where the train and test data were generated by factors with different correlations.
Hardware Specification	Yes	GPU: NVIDIA TITAN Xp graphics card (24GB). CPU: Intel E4-2620 v4 processor (64GB).
Software Dependencies	No	For stochastic optimization, we use automatic differentiation in Py Torch, with optimization using Adam (Kingma and Ba, 2015) with default settings (beta1=0.9, beta2=0.999) For LDA, we used Python s sklearn package with default settings.
Experiment Setup	Yes	Table 6: Settings for each experiment. Synthetic data ... # hidden layers 3 # layer dimension 50 Latent space dimension 5 Learning rate 0.01 Epochs 200 Batch size 100 Loss function Gaussian Sparse VAE λ1 = 1, λ0 = 10 β-VAE [2, 4, 6, 8, 16] VSC α = 0.01 OI-VAE λ = 1, p = 5 Runtime per split CPU, 2 mins