Diagnosing and Fixing Manifold Overfitting in Deep Generative Models

Authors: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Jesse C Cresswell, Anthony L. Caterini

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we achieve significant empirical improvements in sample quality over maximum-likelihood, strongly supporting our theoretical findings. We show these improvements persist even when accounting for the additional parameters of the second-step model, or when adding Gaussian noise to the data as an attempt to remove the dimensionality mismatch that causes manifold overfitting.
Researcher Affiliation Industry Gabriel Loaiza-Ganem EMAIL Layer 6 AI Brendan Leigh Ross EMAIL Layer 6 AI Jesse C. Cresswell EMAIL Layer 6 AI Anthony L. Caterini EMAIL Layer 6 AI
Pseudocode No The paper describes methods and procedures in narrative text and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code7 provides baseline implementations of all our considered GAEs and DGMs, which we hope will be useful to the community even outside of our proposed two-step methodology. 7https://github.com/layer6ai-labs/two_step_zoo
Open Datasets Yes We show the results in Table 1 for MNIST, FMNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), and CIFAR-10 (Krizhevsky, 2009).
Dataset Splits No The paper mentions using well-known datasets like MNIST, FMNIST, SVHN, CIFAR-10, and FFHQ, which typically have predefined splits. However, it does not explicitly state the train/test/validation split percentages, counts, or specific methodology used for these datasets within the main text or appendices for reproducibility.
Hardware Specification No The paper mentions software packages used (e.g., PyTorch, TensorFlow) but does not provide specific details about the hardware (e.g., GPU models, CPU types) on which the experiments were run.
Software Dependencies No We wrote our code in Python (Van Rossum & Drake, 2009), and specifically relied on the following packages: Matplotlib (Hunter, 2007), Tensor Flow (Abadi et al., 2015) (particularly for Tensor Board), Jupyter Notebook (Kluyver et al., 2016), Py Torch (Paszke et al., 2019), nflows (Durkan et al., 2020), Num Py (Harris et al., 2020), prdc (Naeem et al., 2020), pytorch-fid (Seitzer, 2020), and functorch (He & Zou, 2021). The specific version numbers for these software dependencies are not provided.
Experiment Setup Yes For all experiments, we use the Adam optimizer, typically with learning rate 0.001. For all experiments we also clip gradient entries larger than 10 during optimization. We also set d = 20 in all experiments. C.2 VAE from Fig. 2: The Gaussian VAE had d = 1, D = 1, and both the encoder and decoder have a single hidden layer with 25 units and Re LU activations. We use the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.001 and train for 200 epochs. We use gradient norm clipping with a value of 10. C.3 Simulated Data: For the EBM model, we use an energy function with two hidden layers of 25 units each and Swish activations (Ramachandran et al., 2017). We use the Adam optimizer with learning rate 0.01, and gradient norm clipping with value of 1. We train for 100 epochs. We follow Du & Mordatch (2019) for the training of the EBM, and use 0.1 for the objective regularization value, iterate Langevin dynamics for 60 iterations at every training step, use a step size of 10 within Langevin dynamics, sample new images with probability 0.05 in the buffer, use Gaussian noise with standard deviation 0.005 in Langevin dynamics, and truncate gradients to ( 0.03, 0.03) in Langevin dynamics.