reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Influence-Guided Diffusion for Dataset Distillation

Authors: Mingyang Chen, Jiawei Du, Bo Huang, Yi Wang, Xiaobo Zhang, Wei Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the training performance of distilled datasets generated by diffusions can be significantly improved by integrating with our IGD method and achieving state-of-the-art performance in distilling Image Net datasets. Particularly, an exceptional result is achieved on the Image Net-1K, reaching 60.3% at IPC=50. Our code is available at https: //github.com/mchen725/DD_IGD. In summary, our contributions are as follows: We propose a new scheme for dataset distillation by framing the task as a guided-diffusion generation problem. We establish a novel diffusion sampling framework that pioneers the integration of the influence function as a guidance for the controlled diffusion generation, with the aim of achieving generalized training-enhancing objectives. Experimental results illustrate that our method significantly improves the performance of diffusion models across different architectures on two Image Net subsets. Furthermore, a state-of-the-art result is achieved on the Image Net-1K, reaching 60.3% at IPC=50.
Researcher Affiliation	Academia	Mingyang Chen1,2, Jiawei Du3, Bo Huang1,2, Yi Wang4, Xiaobo Zhang5, Wei Wang1,2 1The Hong Kong University of Science and Technology (Guangzhou) 2The Hong Kong University of Science and Technology 3CFAR, A*STAR, Singapore 4Dongguan University of Technology 5Southwest Jiaotong University
Pseudocode	Yes	Algorithm 1 outlines the detailed process of our influence-guided diffusion sampling framework for generating each synthetic image. ... Algorithm 2: Filtering Algorithm for Influence Guidance
Open Source Code	Yes	Our code is available at https: //github.com/mchen725/DD_IGD.
Open Datasets	Yes	As our primary interest lies in large-scale, high-resolution distillation tasks, we assess the performance of our method on the complete Image Net-1K dataset (224 224) (Russakovsky et al., 2015). To provide comparable evaluations across varying task difficulties, we conduct comprehensive experiments on two representative subsets, Image Nette and Image Woof (Howard, 2019). ... We evaluate the performance of our IGD methods on Food-101 (Bossard et al., 2014) dataset to provide further test on distilling other large, high-resolution datasets. ... Specifically, we compare the performance of our framework with ... on CIFAR-10 and CIFAR-100.
Dataset Splits	Yes	Food-101 is a challenging dataset that includes 101 food categories, totaling 101,000 images, with each category containing 250 manually reviewed test images and 750 training images. All images are scaled to a maximum side length of 256 pixels.
Hardware Specification	Yes	All the experimental results of our method can be obtained on a single RTX 4090 GPU.
Software Dependencies	No	For a fair comparison, we follow the official implementation of Minimax, utilizing a latent Di T model from Pytorch s official repository and an open-source VAE model from Stable Diffusion. DDIM (Song et al., 2020a) with 50 denoised steps is used as the vanilla sampling method for generation. ... We employ the second-order DPM solver with 20 denoising steps by default.
Experiment Setup	Yes	For each test dataset, we train a 6-layer Conv Net (Conv Net-6) for 50 epochs with the learning rate 1 10 2 to collect the surrogate checkpoints used in Equation (7). The similarity threshold for choosing representative checkpoints is set as 0.7. The detailed setup of hyperparameters k and γt for each datasets is discussed in Appendix A.10. ... In Table 12, we provide a detailed hyperparameter configuration for k and γt in Equation (7) to replicate the results obtained across Image Nette, Image Woof, and Image Net-1K datasets.