reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

Authors: Kushagra Pandey, Avideep Mukherjee, Piyush Rai, Abhishek Kumar

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now investigate several properties of the Diﬀuse VAE model. We use a mix of qualitative and quantitative evaluations for demonstrating these properties on several image synthesis benchmarks including CIFAR-10 (Krizhevsky, 2009), Celeb A-64 (Liu et al., 2015), Celeb A-HQ (Karras et al., 2018) and LHQ-256 (Skorokhodov et al., 2021) datasets. For quantitative evaluations involving sample quality, we use the FID (Heusel et al., 2018) metric.
Researcher Affiliation	Collaboration	Kushagra Pandey EMAIL Department of Computer Science University of California, Irvine. Avideep Mukherjee EMAIL Department of Computer Science Indian Institute of Technology, Kanpur. Piyush Rai EMAIL Department of Computer Science Indian Institute of Technology, Kanpur. Abhishek Kumar EMAIL Google Research, Brain Team
Pseudocode	Yes	Algorithm 1 DDPM Training (Form. 2). Algorithm 2 DDPM Inference (Form. 2)
Open Source Code	Yes	For reproducibility, our source code is publicly available at https://github.com/kpandey008/Diffuse VAE.
Open Datasets	Yes	We use a mix of qualitative and quantitative evaluations for demonstrating these properties on several image synthesis benchmarks including CIFAR-10 (Krizhevsky, 2009), Celeb A-64 (Liu et al., 2015), Celeb A-HQ (Karras et al., 2018) and LHQ-256 (Skorokhodov et al., 2021) datasets.
Dataset Splits	Yes	For quantitative evaluations involving sample quality, we use the FID (Heusel et al., 2018) metric. We also report the Inception Score (IS) metric (Salimans et al., 2016) for stateof-the-art comparisons on CIFAR-10. For all the experiments, we set the number of diﬀusion time-steps (T) to 1000 during training. The noise schedule in the DDPM forward process was set to a linear schedule between β1 = 10 4 and β2 = 0.02 during training. More details regarding the model and training hyperparameters can be found in Appendix F. Some additional experimental results are presented in Appendix G. ... For Celeb A-HQ 256 comparisons we computed FID scores on 30k samples since the Celeb A-HQ dataset contains 30k images.
Hardware Specification	Yes	We used a mix of 4 Nvidia 1080Ti GPUs (44GB memory), a cloud TPUv2-8 (64GB memory) and a cloud TPUv3-8 (128GB memory) for training the models. Speciﬁcally, we used the GPU setup for training our CIFAR-10 and Celeb A-64 models while we utilized the TPUv2-8 for training Celeb A-HQ models at the 128 x 128 resolutions. Finally, we utilized the TPUv3-8 model for training on Celeb A-HQ and LHQ models at 256 x 256 resolution.
Software Dependencies	No	The paper mentions using 'U-Net...from (Nichol & Dhariwal, 2021)' and 'U-Net...from DDIM (Song et al., 2021a) (https://github.com/ermongroup/ddim/blob/main/models/diffusion.py)' and 'torch-fidelity (Obukhov et al., 2020) package' but does not provide specific version numbers for the programming language or other key libraries used.
Experiment Setup	Yes	All hyperparameters details related to VAE and DDPM training in Diﬀuse VAE are listed in Table 9. Moreover, all hyperparameters (model and training) were shared between both Diﬀuse VAE formulations. ... For all the experiments, we set the number of diﬀusion time-steps (T) to 1000 during training. The noise schedule in the DDPM forward process was set to a linear schedule between β1 = 10 4 and β2 = 0.02 during training. ... Latent code size was set to 1024 for LHQ-256 and Celeb A-HQ (both 128 and 256 resolution variants) and 512 for the CIFAR-10 and Celeb A (64 x 64) datasets. ... Eﬀective Batch Size 128 ... Optimizer Adam(lr=1e-4) ... KL-weight 1.0 ... Dropout 0.3 ... Noise Schedule (default) Linear(1e-4, 0.02) ... EMA decay rate 0.9999 ... Grad. Clip Threshold 1.0 ... # of lr annealing steps 5000 ... Diﬀusion loss type Noise prediction (L2).