Physics Informed Distillation for Diffusion Models
Authors: Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja Gowda, Chanwoo Kim, Chang D. Yoo
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on CIFAR 10 and Image Net 64x64, we observe that PID achieves performance comparable to recent distillation methods. Notably, it demonstrates predictable trends concerning method-specific hyperparameters and eliminates the need for synthetic dataset generation during the distillation process. Both of which contribute to its easy-to-use nature as a distillation approach for Diffusion Models. |
| Researcher Affiliation | Collaboration | Joshua Tian Jin Tee* & Kang Zhang* EMAIL School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Hee Suk Yoon EMAIL School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Dhananjaya Nagaraja Gowda EMAIL Samsung Research Chanwoo Kim EMAIL Department of Artificial Intelligence Korea University Chang D. Yoo EMAIL School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) |
| Pseudocode | Yes | Algorithm 1 Physics Informed Distillation Training Input: Trained teacher model Dϕ, PID model xθ, LPIPS loss d( , ), learning rate η, discretization number N. |
| Open Source Code | Yes | Our code and pre-trained checkpoint are publicly available at: https://github.com/pantheon5100/pid_diffusion.git. |
| Open Datasets | Yes | In this section, we empirically validate our theoretical findings through various experiments on CIFAR10 (Krizhevsky & Hinton, 2009) and Image Net 64x64 (Deng et al., 2009). |
| Dataset Splits | Yes | In this section, we empirically validate our theoretical findings through various experiments on CIFAR10 (Krizhevsky & Hinton, 2009) and Image Net 64x64 (Deng et al., 2009). The results are compared according to Frechet Inception Distance (FID) (Heusel et al., 2017b) and Inception Score (IS) (Salimans et al., 2016). All experiments for PID were initialized with the EDM teacher model, and all the competing methods were also initialized with their respective teacher diffusion model weights as noted in Table 1 and Table 2. In addition, unless stated otherwise, a discretization of 250 and LPIPS metric was used during training. More information on the training details can be seen in Appendix A.1. |
| Hardware Specification | Yes | All the experiments are run with Py Torch (Paszke et al., 2019) on NVIDIA A100 GPU. ... Table 6: Hyperparameters used for the training runs. Number of GPUs 8x A100 (CIFAR-10) 32x A100 (Imaget Net 64x64) |
| Software Dependencies | Yes | All the experiments are run with Py Torch (Paszke et al., 2019) on NVIDIA A100 GPU. |
| Experiment Setup | Yes | More information on the training details can be seen in Appendix A.1. Additional details on training hyperparameters are shown in Table 6. All the experiments are run with Py Torch (Paszke et al., 2019) on NVIDIA A100 GPU. ... Table 6: Hyperparameters used for the training runs. Hyperparameter CIFAR-10 Imaget Net 64x64 Number of GPUs 8x A100 32x A100 Batch size 512 2048 Gradient clipping Mixed-precision (FP16) Learning rate 10 4 2 1 Dropout probability 0% 0% EMA student model 0.99995 0.99995 |