Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model

Authors: Jincheng Zhong, XiangCheng Zhang, Jianmin Wang, Mingsheng Long

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FDDINOv2 compared to standard fine-tuning. Notably, existing fine-tuned models can seamlessly integrate Domain Guidance to leverage these benefits, without additional training. Experimentally, we evaluate Do G across seven well-established transfer learning benchmarks, providing quantitative and qualitative evidence to substantiate its efficacy. Our comprehensive ablation study further underscores its superiority in the transfer of pre-trained diffusion models.
Researcher Affiliation Academia Jincheng Zhong, Xiangcheng Zhang, Jianmin Wang, Mingsheng Long School of Software, BNRist, Tsinghua University, China EMAIL, EMAIL
Pseudocode No The paper includes mathematical formulations (Equation 1 to 10) in sections like 'Diffusion formulation' and 'Classifier-free guidance' and 'Domain Guidance', but it does not contain explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Code is available at this repository: https://github.com/thuml/Domain Guidance.
Open Datasets Yes Our benchmark setups include 7 fine-grained downstream datasets: Food101 (Bossard et al., 2014), SUN397 (Xiao et al., 2010), DF20-Mini (Picek et al., 2022), Caltech101 (Griffin et al., 2007), CUB-200-2011 (Wah et al., 2011), Art Bench-10 (Liao et al., 2022), and Stanford Cars (Krause et al., 2013).
Dataset Splits Yes We fine-tune our domain model on a random partition of the whole dataset with 76,128 training images, 10,875 validation images and 21,750 test images. ... The Stanford Cars dataset, there are 16,185 images that display 196 distinct classes of cars. These images are divided into a training and a testing set: 8,144 images for training and 8,041 images for testing. ... Art Bench-10 ... It contains 5,000 training images and 1,000 testing images per style.
Hardware Specification Yes Each fine-tuning task is executed on a single NVIDIA A100 40GB GPU over approximately 6 hours.
Software Dependencies No All of our experiments are inplemented using Py Torch and conducted on NVIDIA A100 40G GPUs. However, no specific version numbers for PyTorch or other software dependencies are provided.
Experiment Setup Yes We perform fine-tuning for 24,000 steps with a batch size of 32 at 256 256 resolution for all benchmarks. The standard fine-tuned models are trained in a CFG style, with a label dropout ratio of 10%. ... we generate 10,000 images with 50 sampling steps per benchmark, setting the guidance weights for both CFG and Do G to 1.5. ... Table 5: Hyperparameter of domain transfer experiments -- Backbone Di T-XL/2, Image Size 256, Batch Size 32, Learning Rate 1e-4, Optimizer Adam, Training Steps 24,000, Validation Interval 24,000, Sampling Steps 50.