reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diagnosing and Fixing Manifold Overfitting in Deep Generative Models

Authors: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Jesse C Cresswell, Anthony L. Caterini

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we achieve significant empirical improvements in sample quality over maximum-likelihood, strongly supporting our theoretical findings. We show these improvements persist even when accounting for the additional parameters of the second-step model, or when adding Gaussian noise to the data as an attempt to remove the dimensionality mismatch that causes manifold overfitting.
Researcher Affiliation	Industry	Gabriel Loaiza-Ganem EMAIL Layer 6 AI Brendan Leigh Ross EMAIL Layer 6 AI Jesse C. Cresswell EMAIL Layer 6 AI Anthony L. Caterini EMAIL Layer 6 AI
Pseudocode	No	The paper describes methods and procedures in narrative text and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code7 provides baseline implementations of all our considered GAEs and DGMs, which we hope will be useful to the community even outside of our proposed two-step methodology. 7https://github.com/layer6ai-labs/two_step_zoo
Open Datasets	Yes	We show the results in Table 1 for MNIST, FMNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), and CIFAR-10 (Krizhevsky, 2009).
Dataset Splits	No	The paper mentions using well-known datasets like MNIST, FMNIST, SVHN, CIFAR-10, and FFHQ, which typically have predefined splits. However, it does not explicitly state the train/test/validation split percentages, counts, or specific methodology used for these datasets within the main text or appendices for reproducibility.
Hardware Specification	No	The paper mentions software packages used (e.g., PyTorch, TensorFlow) but does not provide specific details about the hardware (e.g., GPU models, CPU types) on which the experiments were run.
Software Dependencies	No	We wrote our code in Python (Van Rossum & Drake, 2009), and specifically relied on the following packages: Matplotlib (Hunter, 2007), Tensor Flow (Abadi et al., 2015) (particularly for Tensor Board), Jupyter Notebook (Kluyver et al., 2016), Py Torch (Paszke et al., 2019), nflows (Durkan et al., 2020), Num Py (Harris et al., 2020), prdc (Naeem et al., 2020), pytorch-fid (Seitzer, 2020), and functorch (He & Zou, 2021). The specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	For all experiments, we use the Adam optimizer, typically with learning rate 0.001. For all experiments we also clip gradient entries larger than 10 during optimization. We also set d = 20 in all experiments. C.2 VAE from Fig. 2: The Gaussian VAE had d = 1, D = 1, and both the encoder and decoder have a single hidden layer with 25 units and Re LU activations. We use the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.001 and train for 200 epochs. We use gradient norm clipping with a value of 10. C.3 Simulated Data: For the EBM model, we use an energy function with two hidden layers of 25 units each and Swish activations (Ramachandran et al., 2017). We use the Adam optimizer with learning rate 0.01, and gradient norm clipping with value of 1. We train for 100 epochs. We follow Du & Mordatch (2019) for the training of the EBM, and use 0.1 for the objective regularization value, iterate Langevin dynamics for 60 iterations at every training step, use a step size of 10 within Langevin dynamics, sample new images with probability 0.05 in the buffer, use Gaussian noise with standard deviation 0.005 in Langevin dynamics, and truncate gradients to ( 0.03, 0.03) in Langevin dynamics.