Distilling Datasets Into Less Than One Image
Authors: Asaf Shul, Eliahu Horwitz, Yedid Hoshen
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Performing extensive experiments that demonstrate the effectiveness of Po DD with as low as 0.3 IPC and achieving a new So TA on the well established 1 IPC benchmark. |
| Researcher Affiliation | Academia | Asaf Shul EMAIL The Hebrew University of Jerusalem Eliahu Horwitz EMAIL The Hebrew University of Jerusalem Yedid Hoshen EMAIL The Hebrew University of Jerusalem |
| Pseudocode | Yes | Algorithm 1 Po CO: Pseudocode for Po CO class ordering Algorithm 2 Po DD: Pseudocode using Po DDL learned labels |
| Open Source Code | No | Project page: https://horwitz.ai/podd/ Explanation: The paper provides a 'Project page' URL, but it does not explicitly state that the source code is hosted there or provide a direct link to a code repository. |
| Open Datasets | Yes | We evaluate Po DD on four datasets commonly used to benchmark dataset distillation methods: i) CIFAR-10: 10 classes, 50k images of size 32 32 3 (Krizhevsky et al., 2009). ii) CIFAR-100: 100 classes, 50k images of size 32 32 3 (Krizhevsky et al., 2009). iii) CUB200: 200 classes, 6k images of size 32 32 3 (Welinder et al., 2010). iv) Tiny-Image Net: 200 classes, 100k images of size 64 64 3 (Le & Yang, 2015). |
| Dataset Splits | Yes | Following the protocol of (Zhao & Bilen, 2021; Deng & Russakovsky, 2022), we evaluate the distilled poster using a set of 8 different randomly initialized models with the same Conv Net (Gidaris & Komodakis, 2018) architecture used by DSA, DM, MTT, Ra T-BPTT, and others. ... The resulting amount of images from each class in our experiment are: [5000, 4750, 4500, 4250, 4000, 3750, 3500, 3250, 3000, 2750]. We did not modify the test set. |
| Hardware Specification | Yes | To fit the distillation into a single GPU (we use an NVIDIA A40), we use the maximal batch size we can fit into memory for a given dataset |
| Software Dependencies | No | We use the same distillation hyper-parameters used by Ra T-BPTT (Feng et al., 2023) except for the batch sizes. Explanation: The paper mentions using the hyperparameters from another method but does not specify any software names with version numbers for their own implementation (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Concretely, we use: i) CIFAR10: p = 96(16 6) patches, bsd = 96, bs = 5000, 4k epochs. ii) CIFAR-100: p = 400(20 20) patches, bsd = 50, bs = 2000, 2k epochs. iii) CUB200: p = 1800(60 30) patches, bsd = 200, bs = 3000, 8k epochs. iv) Tiny-Image Net: p = 800(40 20) patches, bsd = 30, bs = 500, 500 epochs. We use the learned labels variant of Po DDL for all of our experiments... We use a learning rate of 0.001 for CIFAR-10, CIFAR-100, and CUB200. For Tiny Image Net into a single GPU we use a much smaller batch size and a learning rate of 0.0005. |