reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

Authors: Xinyi Shang, Peng Sun, Tao Lin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments indicate that GIFT consistently enhances state-of-the-art dataset distillation methods across various dataset scales, without incurring additional computational costs. Importantly, GIFT significantly enhances cross-optimizer generalization, an area previously overlooked. For instance, on Image Net-1K with IPC = 10, GIFT enhances the state-of-the-art method RDED by 30.8% in cross-optimizer generalization 1.
Researcher Affiliation	Academia	1University College London 2Zhejiang University 3Westlake University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Our method does not involve any distilling datasets process. We obtain all synthetic datasets directly from the source data provided by the authors 5. Notably, distilled data is generalized using both Conv Net and Res Net-18. We replace the loss function during evaluation. Thus, our method is a plug-and-play approach that can be easily integrated into existing dataset distillation pipelines without additional dataset synthesis or modification. A Py Torch implementation of our method is provided in Appendix F .
Open Source Code	Yes	1Our code is available at https://github.com/LINs-lab/GIFT. We also have provided Pytorch implementation code in Appendix F .
Open Datasets	Yes	We conduct experiments on both large-scale and small-scale datasets, including the full 224 224 Image Net-1k (Deng et al., 2009), Tiny-Image Net (Le & Yang, 2015) and CIFAR-100 (Krizhevsky et al., 2009).
Dataset Splits	Yes	We conduct evaluations on these synthetic datasets with IPC {1, 10, 50}. Additional visualizations for Tiny-Image Net and Image Net-1K at IPC = 1 and IPC = 50 are provided in Appendix D . Notably, our empirical study is conducted from the end-user perspective: we treat the distillation of the synthetic dataset as a black box and apply different loss functions during the evaluation of the synthetic datasets. ... Table 11: Details about the datasets Dataset Num of Classes IPC of Trainset IPC of Testset CIFAR-10 10 5000 1000 CIFAR-100 100 500 100 Tiny-Image Net 200 500 50 Image Net-1k 1000 732 1300 50
Hardware Specification	Yes	All experiments are conducted using an NVIDIA RTX 4090 GPU.
Software Dependencies	No	A Py Torch implementation of our method is provided in Appendix F . (No specific version numbers for PyTorch or other libraries are provided.)
Experiment Setup	Yes	Hyperparameter Settings. We provide detailed hyperparameter configurations for our synthetic dataset evaluation in Appendix C . Following recent works (Yin et al., 2023; Shao et al., 2024), the evaluation on all datasets uses the parameters outlined in Table 15 . We set the coefficient of label smoothing α = 0.1 and the weight hyper-parameter γ = 0.1 for all methods across various synthetic datasets, as γ = 0.1 is validated to be optimal through experiments depicted in Section 5.6 and Section 8 in Appendix D .