MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models

Authors: Jeffrey A Chan Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, Mubarak Shah

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach outperforms stateof-the-art methods, achieving accuracy gains of 4.4%, 2.9%, 1.6%, and 1.6% on Image Nette, Image IDC, Image Net-100, and Image Net-1K, respectively. Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs. Our code is available on the project webpage: https://jachansantiago.github.io/modeguided-distillation/
Researcher Affiliation Collaboration 1Center for Research in Computer Vision, University of Central Florida, Orlando, Florida, United States 2Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India 3Cisco Research, San Jose, California, United States. Correspondence to: Jeffrey A. Chan-Santiago <EMAIL>.
Pseudocode Yes Algorithm 1 Mode Guidance with DDIM sampling, given a diffusion model ϵθ(xt), an estimated mode mk and mode guidance scale λ.
Open Source Code Yes Our code is available on the project webpage: https://jachansantiago.github.io/modeguided-distillation/
Open Datasets Yes The datasets we evaluate include Image Net-1K, Image Net-100, Image Net IDC, Image Nette, and Image Net-A to Image Net E. Additionally, we include results from Image Woof in the Appendix E.
Dataset Splits No The hard-label protocol generates a dataset with its corresponding class labels, trains a network from scratch, and evaluates the network on the original test set. This process is repeated three times for target architectures, and the accuracy mean and standard deviation are reported. Random resize-crop and Cut Mix are applied as augmentation techniques during the target network s training. For more detailed technical information about the protocol, please refer to Gu et al. (2024).
Hardware Specification Yes We use a single NVIDIA RTX A5000 GPU with 24GB VRAM to run our experiments.
Software Dependencies No No specific software dependencies with version numbers are explicitly provided in the paper.
Experiment Setup Yes Implementation details. Our pre-trained model G is Di TXL/2 trained on Image Net, and the image size is 256 x 256. We use the sampling strategy described in Peebles & Xie (2023), which uses 50 sampling steps using classifier-free guidance with a guidance scale of 4.0. For Mode Guidance, we set λ to 0.1, and in our experiments, we use stop guidance tstop = 25. We use K-means to perform mode discovery; we set k = IPC. For the hard-label protocol... We train our model on a synthetic dataset for 1500 epochs for IPC values of 20, 50, and 100, and extend the training to 2000 epochs for an IPC value of 10. We use Stochastic Gradient Descent (SGD) as the optimizer, setting the learning rate at 0.01. We use a learning rate decay scheduler at the 2/3 and 5/6 points of the training process, with the decay factor (gamma) set to 0.2. Cross-entropy was used as the Loss objective. For the soft-label protocol... We train a network for 300 epochs with Resnet-18 architecture as both teacher and student. We use the Adam W optimizer, with a learning rate set at 0.001, a weight decay of 0.01, and the parameters β1 = 0.9 and β2 = 0.999.