Dataset Condensation with Color Compensation
Authors: Huyu Wu, Duo Su, Junjie Hou, Guang Li
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superior performance and generalization of DC3 that outperforms SOTA methods across multiple benchmarks. To the best of our knowledge, besides focusing on downstream tasks, DC3 is the first research to fine-tune pre-trained diffusion models with condensed datasets. The Frechet Inception Distance (FID) and Inception Score (IS) results prove that training networks with our high-quality datasets is feasible without model collapse or other degradation issues. 4 Experiments 4.1 Experimental Settings 4.2 Comparison with SOTA Methods 4.3 Cross-architecture Generalization 4.4 Ablation Study |
| Researcher Affiliation | Academia | 1University of Chinese Academy of Sciences 2Tsinghua University 3Hong Kong University of Science and Technology 4Hokkaido University *Corresponding Authors: EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Submodular Sampling Input: x C: Data in bins (Bins generated by clustering). Input: M: Number of data bins. Input: N: Image per class (IPC). 1: for Cj in C do 2: for xk in Cj do 3: Compute G(xk) according to eq. (4) 4: end for 5: Sj = 6: C j = sort(Cj, G, desc) 7: if N M then 8: M = N 9: end if 10: Sj = C j[0 : N/M] # Selection 11: end for 12: S = SM j=1 Sj Output: S: The selected sample set. |
| Open Source Code | Yes | Code and generated data are available at https://github.com/528why/Dataset-Condensation-with-Color-Compensation. |
| Open Datasets | Yes | For large-scale datasets, we include Image Net-1K (Deng et al., 2009) (224 224) and its subsets, such as Tiny-Image Net (64 64). Small-scale low-resolution (32 32) analysis utilizes CIFAR-10/100 (Krizhevsky et al., 2009). To quantify task difficulty sensitivity, we benchmark Image Nette and Image Woof, the subsets of Image Net with 10 classes. |
| Dataset Splits | Yes | For large-scale datasets, we include Image Net-1K (Deng et al., 2009) (224 224) and its subsets, such as Tiny-Image Net (64 64). Small-scale low-resolution (32 32) analysis utilizes CIFAR-10/100 (Krizhevsky et al., 2009). To quantify task difficulty sensitivity, we benchmark Image Nette and Image Woof, the subsets of Image Net with 10 classes. Note that Image Woof poses greater discrimination challenges due to more inter-class similarity. We use Stable Diffusion-V1.5 and Di T-XL/2-256 as our foundation models. Following the prior works (Sun et al., 2024; Chen et al., 2025), we set IPC to 1, 10, and 50. |
| Hardware Specification | Yes | Performance validation is conducted using Py Torch on 8 NVIDIA 3090 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch for performance validation and Stable Diffusion-V1.5 and Di T-XL/2-256 as foundation models. However, it does not specify version numbers for PyTorch or other software libraries. |
| Experiment Setup | Yes | Settings Values guidance scale 4 network Res Net18 input size 224 optimizer Adam W learning rate 0.001 weight decay 0.01 (a) Image Net Settings Values guidance scale 4 network Res Net18 input size 32 optimizer Adam W learning rate 0.001 weight decay 0.01 (b) CIFAR-10 and CIFAR-100 Settings Values guidance scale 4 network Res Net18 input size 224 optimizer Adam W learning rate 0.001 weight decay 0.01 (c) Image Woof and Image Nette Table 11: Evaluation details for different datasets. |