Hybrid Data-Free Knowledge Distillation

Authors: Jialiang Tang, Shuo Chen, Chen Gong

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Intensive experiments across multiple benchmarks demonstrate that our Hi DFD can achieve state-of-the-art performance using 120 times less collected data than existing methods. Original Datasets. We evaluate the effectiveness of our Hi DFD on popular datasets, including CIFAR (Krizhevsky 2009), CINIC (Darlow et al. 2018), and Tiny Image Net (Le and Yang 2015), which are widely used by existing DFKD methods (Chen et al. 2019, 2021b). Additionally, we also conduct experiments on the large-scale Image Net (Deng et al. 2009) and the practical medical image dataset HAM (Tschandl, Rosendahl, and Kittler 2018), which are challenging for existing DFKD methods.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Nanjing University of Science and Technology, China 2Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, China 3Jiangsu Key Laboratory of Image and Video Understanding for Social Security, China 4Center for Advanced Intelligence Project, RIKEN, Japan 5Department of Automation, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, China
Pseudocode Yes The whole algorithm of our proposed Hi DFD is given in Appendix.
Open Source Code Yes Code https://github.com/tangjialiang97/Hi DFD
Open Datasets Yes Original Datasets. We evaluate the effectiveness of our Hi DFD on popular datasets, including CIFAR (Krizhevsky 2009), CINIC (Darlow et al. 2018), and Tiny Image Net (Le and Yang 2015), which are widely used by existing DFKD methods (Chen et al. 2019, 2021b). Additionally, we also conduct experiments on the large-scale Image Net (Deng et al. 2009) and the practical medical image dataset HAM (Tschandl, Rosendahl, and Kittler 2018), which are challenging for existing DFKD methods. Collected Datasets. When using CIFAR and CINIC as the original datasets, we search for examples from Image Net. With Tiny Image Net and Image Net as the original datasets, we utilize Web Vision (Li et al. 2017) as our source of collected data. Moreover, we collect examples from ISIC (Codella et al. 2018) when using HAM as the original dataset.
Dataset Splits Yes Here, we define the ratio between the collected data Dc and original data Do as ρ = |Dc|/|Do|. We construct small (ρ=0.1) and moderate (ρ=1.0) collected data for the experiments of collection-based DFKD methods. We adopt a moderate inflation factor of N = |Ds|/|Dc| and further details are available in Extended Experiments. We follow (Chen et al. 2021b) and sample a part of examples from the corresponding dataset as collected data Dc.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions optimizers like SGD and Adam, but does not provide specific software dependencies with version numbers, such as Python version, or specific deep learning framework versions (e.g., PyTorch, TensorFlow).
Experiment Setup Yes All student networks in our Hi DFD employ SGD with weight decay as 5 10 4 and momentum as 0.9 as the optimizer. The student networks are trained over 240 epochs with a learning rate of 0.05, which is sequentially divided by 10 at the 150th, 180th, and 210th epochs. Meanwhile, the generator and discriminator in GAN utilize Adam for optimization with learning rates 1 10 4 and 4 10 4, respectively, and both of them are trained over 500 epochs. Additionally, the hyper-parameters in Eq. (13) are configured as λd = 0.1 and λg = 0.1.