Distilling Datasets Into Less Than One Image

Authors: Asaf Shul, Eliahu Horwitz, Yedid Hoshen

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Performing extensive experiments that demonstrate the effectiveness of Po DD with as low as 0.3 IPC and achieving a new So TA on the well established 1 IPC benchmark.
Researcher Affiliation Academia Asaf Shul EMAIL The Hebrew University of Jerusalem Eliahu Horwitz EMAIL The Hebrew University of Jerusalem Yedid Hoshen EMAIL The Hebrew University of Jerusalem
Pseudocode Yes Algorithm 1 Po CO: Pseudocode for Po CO class ordering Algorithm 2 Po DD: Pseudocode using Po DDL learned labels
Open Source Code No Project page: https://horwitz.ai/podd/ Explanation: The paper provides a 'Project page' URL, but it does not explicitly state that the source code is hosted there or provide a direct link to a code repository.
Open Datasets Yes We evaluate Po DD on four datasets commonly used to benchmark dataset distillation methods: i) CIFAR-10: 10 classes, 50k images of size 32 32 3 (Krizhevsky et al., 2009). ii) CIFAR-100: 100 classes, 50k images of size 32 32 3 (Krizhevsky et al., 2009). iii) CUB200: 200 classes, 6k images of size 32 32 3 (Welinder et al., 2010). iv) Tiny-Image Net: 200 classes, 100k images of size 64 64 3 (Le & Yang, 2015).
Dataset Splits Yes Following the protocol of (Zhao & Bilen, 2021; Deng & Russakovsky, 2022), we evaluate the distilled poster using a set of 8 different randomly initialized models with the same Conv Net (Gidaris & Komodakis, 2018) architecture used by DSA, DM, MTT, Ra T-BPTT, and others. ... The resulting amount of images from each class in our experiment are: [5000, 4750, 4500, 4250, 4000, 3750, 3500, 3250, 3000, 2750]. We did not modify the test set.
Hardware Specification Yes To fit the distillation into a single GPU (we use an NVIDIA A40), we use the maximal batch size we can fit into memory for a given dataset
Software Dependencies No We use the same distillation hyper-parameters used by Ra T-BPTT (Feng et al., 2023) except for the batch sizes. Explanation: The paper mentions using the hyperparameters from another method but does not specify any software names with version numbers for their own implementation (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Concretely, we use: i) CIFAR10: p = 96(16 6) patches, bsd = 96, bs = 5000, 4k epochs. ii) CIFAR-100: p = 400(20 20) patches, bsd = 50, bs = 2000, 2k epochs. iii) CUB200: p = 1800(60 30) patches, bsd = 200, bs = 3000, 8k epochs. iv) Tiny-Image Net: p = 800(40 20) patches, bsd = 30, bs = 500, 500 epochs. We use the learned labels variant of Po DDL for all of our experiments... We use a learning rate of 0.001 for CIFAR-10, CIFAR-100, and CUB200. For Tiny Image Net into a single GPU we use a much smaller batch size and a learning rate of 0.0005.