MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models
Authors: Jeffrey A Chan Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, Mubarak Shah
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach outperforms stateof-the-art methods, achieving accuracy gains of 4.4%, 2.9%, 1.6%, and 1.6% on Image Nette, Image IDC, Image Net-100, and Image Net-1K, respectively. Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs. Our code is available on the project webpage: https://jachansantiago.github.io/modeguided-distillation/ |
| Researcher Affiliation | Collaboration | 1Center for Research in Computer Vision, University of Central Florida, Orlando, Florida, United States 2Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India 3Cisco Research, San Jose, California, United States. Correspondence to: Jeffrey A. Chan-Santiago <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Mode Guidance with DDIM sampling, given a diffusion model ϵθ(xt), an estimated mode mk and mode guidance scale λ. |
| Open Source Code | Yes | Our code is available on the project webpage: https://jachansantiago.github.io/modeguided-distillation/ |
| Open Datasets | Yes | The datasets we evaluate include Image Net-1K, Image Net-100, Image Net IDC, Image Nette, and Image Net-A to Image Net E. Additionally, we include results from Image Woof in the Appendix E. |
| Dataset Splits | No | The hard-label protocol generates a dataset with its corresponding class labels, trains a network from scratch, and evaluates the network on the original test set. This process is repeated three times for target architectures, and the accuracy mean and standard deviation are reported. Random resize-crop and Cut Mix are applied as augmentation techniques during the target network s training. For more detailed technical information about the protocol, please refer to Gu et al. (2024). |
| Hardware Specification | Yes | We use a single NVIDIA RTX A5000 GPU with 24GB VRAM to run our experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers are explicitly provided in the paper. |
| Experiment Setup | Yes | Implementation details. Our pre-trained model G is Di TXL/2 trained on Image Net, and the image size is 256 x 256. We use the sampling strategy described in Peebles & Xie (2023), which uses 50 sampling steps using classifier-free guidance with a guidance scale of 4.0. For Mode Guidance, we set λ to 0.1, and in our experiments, we use stop guidance tstop = 25. We use K-means to perform mode discovery; we set k = IPC. For the hard-label protocol... We train our model on a synthetic dataset for 1500 epochs for IPC values of 20, 50, and 100, and extend the training to 2000 epochs for an IPC value of 10. We use Stochastic Gradient Descent (SGD) as the optimizer, setting the learning rate at 0.01. We use a learning rate decay scheduler at the 2/3 and 5/6 points of the training process, with the decay factor (gamma) set to 0.2. Cross-entropy was used as the Loss objective. For the soft-label protocol... We train a network for 300 epochs with Resnet-18 architecture as both teacher and student. We use the Adam W optimizer, with a learning rate set at 0.001, a weight decay of 0.01, and the parameters β1 = 0.9 and β2 = 0.999. |