Physics Informed Distillation for Diffusion Models

Authors: Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja Gowda, Chanwoo Kim, Chang D. Yoo

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on CIFAR 10 and Image Net 64x64, we observe that PID achieves performance comparable to recent distillation methods. Notably, it demonstrates predictable trends concerning method-specific hyperparameters and eliminates the need for synthetic dataset generation during the distillation process. Both of which contribute to its easy-to-use nature as a distillation approach for Diffusion Models.
Researcher Affiliation Collaboration Joshua Tian Jin Tee* & Kang Zhang* EMAIL School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Hee Suk Yoon EMAIL School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Dhananjaya Nagaraja Gowda EMAIL Samsung Research Chanwoo Kim EMAIL Department of Artificial Intelligence Korea University Chang D. Yoo EMAIL School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST)
Pseudocode Yes Algorithm 1 Physics Informed Distillation Training Input: Trained teacher model Dϕ, PID model xθ, LPIPS loss d( , ), learning rate η, discretization number N.
Open Source Code Yes Our code and pre-trained checkpoint are publicly available at: https://github.com/pantheon5100/pid_diffusion.git.
Open Datasets Yes In this section, we empirically validate our theoretical findings through various experiments on CIFAR10 (Krizhevsky & Hinton, 2009) and Image Net 64x64 (Deng et al., 2009).
Dataset Splits Yes In this section, we empirically validate our theoretical findings through various experiments on CIFAR10 (Krizhevsky & Hinton, 2009) and Image Net 64x64 (Deng et al., 2009). The results are compared according to Frechet Inception Distance (FID) (Heusel et al., 2017b) and Inception Score (IS) (Salimans et al., 2016). All experiments for PID were initialized with the EDM teacher model, and all the competing methods were also initialized with their respective teacher diffusion model weights as noted in Table 1 and Table 2. In addition, unless stated otherwise, a discretization of 250 and LPIPS metric was used during training. More information on the training details can be seen in Appendix A.1.
Hardware Specification Yes All the experiments are run with Py Torch (Paszke et al., 2019) on NVIDIA A100 GPU. ... Table 6: Hyperparameters used for the training runs. Number of GPUs 8x A100 (CIFAR-10) 32x A100 (Imaget Net 64x64)
Software Dependencies Yes All the experiments are run with Py Torch (Paszke et al., 2019) on NVIDIA A100 GPU.
Experiment Setup Yes More information on the training details can be seen in Appendix A.1. Additional details on training hyperparameters are shown in Table 6. All the experiments are run with Py Torch (Paszke et al., 2019) on NVIDIA A100 GPU. ... Table 6: Hyperparameters used for the training runs. Hyperparameter CIFAR-10 Imaget Net 64x64 Number of GPUs 8x A100 32x A100 Batch size 512 2048 Gradient clipping Mixed-precision (FP16) Learning rate 10 4 2 1 Dropout probability 0% 0% EMA student model 0.99995 0.99995