reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Authors: Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Song Han

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on Image Net 512 512, our DC-AE provides 19.1 inference speedup and 17.9 training speedup on H100 GPU for UVi T-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder.
Researcher Affiliation	Collaboration	Junyu Chen1,2 , Han Cai3 , Junsong Chen3, Enze Xie3, Shang Yang1, Haotian Tang1, Muyang Li1, Song Han1,3 1MIT 2Tsinghua University 3NVIDIA
Pseudocode	No	The paper includes figures describing architectural components and training pipelines (e.g., Figure 4, Figure 6, Figure 10) but no formal pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/mit-han-lab/efficientvit
Open Datasets	Yes	We use a mixture of datasets to train autoencoders (baselines and DC-AE), containing Image Net (Deng et al., 2009), SAM (Kirillov et al., 2023), Mapillary Vistas (Neuhold et al., 2017), and FFHQ (Karras et al., 2019).
Dataset Splits	Yes	For Image Net experiments, we exclusively use the Image Net training split to train autoencoders and diffusion models.
Hardware Specification	Yes	We profile the training and inference throughput on the H100 GPU with Py Torch and Tensor RT respectively. The latency is measured on the 3090 GPU with batch size 2.
Software Dependencies	No	The paper mentions using PyTorch and Tensor RT but does not specify version numbers for these software components. It also mentions AdamW optimizer, but no version.
Experiment Setup	Yes	In phase 1 (low-resolution full training), we use a constant learning rate of 6.4e-5 with a weight decay of 0.1, and Adam W betas of (0.9, 0.999). We use L1 loss and LPIPS loss (Zhang et al., 2018). In phase 2 (high-resolution latent adaptation), we use a constant learning rate of 1.6e-5, a weight decay of 0.001, and Adam W betas of (0.9, 0.999). We use the same loss as phase 1. In phase 3 (low-resolution local refinement), we use a constant learning rate of 5.4e-5, and Adam W betas of (0.5, 0.9). We use L1 loss, LPIPS loss (Zhang et al., 2018), and Patch GAN loss (Isola et al., 2017). The Si T and USi T models are trained for 500k iterations with batch size 1024.