Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Authors: Jin-Young Kim, Hyojun Go, Soonwoo Kwon, Hyun-Gyoon Kim
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation. |
| Researcher Affiliation | Academia | Jin-Young Kim Hyojun Go Soonwoo Kwon Hyun-Gyoon kim1 Ajou University1 EMAIL, EMAIL |
| Pseudocode | Yes | The detailed process and overall curriculum learning procedure are outlined in Algorithm ?? and ?? in Appendix D, respectively. |
| Open Source Code | No | The paper does not provide a direct link to a source-code repository, an explicit statement about code release, or mention code in supplementary materials. |
| Open Datasets | Yes | By integrating our curriculum learning strategy into architectures Di T (Peebles & Xie, 2022), EDM (Karras et al., 2022), and Si T (Ma et al., 2024) we demonstrate the efficacy of our approach in enhancing performance, accelerating convergence speed, and maintaining compatibility with existing techniques. . . . These include unconditional generation, classconditional generation, and text-to-image generation, utilizing datasets such as FFHQ (Karras et al., 2019), Image Net (Deng et al., 2009), and MS-COCO (Lin et al., 2014). |
| Dataset Splits | Yes | utilizing datasets such as FFHQ (Karras et al., 2019), Image Net (Deng et al., 2009), and MS-COCO (Lin et al., 2014). By integrating our curriculum learning strategy into architectures Di T (Peebles & Xie, 2022), EDM (Karras et al., 2022), and Si T (Ma et al., 2024) we demonstrate the efficacy of our approach in enhancing performance, accelerating convergence speed, and maintaining compatibility with existing techniques. . . . For our comprehensive evaluation of various methods, we employed three distinct image-generation tasks: 1) Unconditional generation with the FFHQ dataset (Karras et al., 2019), 2) Class-conditional generation with CIFAR-10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets, and 3) Text-to-Image generation with MS-COCO dataset (Lin et al., 2014). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We examined the robustness of the proposed curriculum training with respect to hyper-parameters: the number of clusters N and the maximum patience τ. As shown in Fig. 4, our method consistently outperforms the vanilla model, and the best result is observed at N = 20, τ = 200. . . . We trained Di T-L/2 with 2M iterations and reported the results in Table 2. |