Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Authors: Mang Ning, Mingxiao Li, Jianlin Su, Jia Haozhe, Lanmiao Liu, Martin Benes, Wenshuo Chen, Albert Ali Salah, Itir Onal Ertugrul

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on different frameworks (UVi T, Di T), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixelbased diffusion models regarding generative quality and training efficiency. Table 2. FID-50k of UVi T and DCTdiff using DDIM sampler and DPM-Solver under different NFEs.
Researcher Affiliation Collaboration 1Utrecht University, the Netherlands 2KU Leuven, Belgium 3Moonshot AI, China 4Shandong University, China 5Max Planck Institute for Psycholinguistics, the Netherlands 6University of Innsbruck, Austria.
Pseudocode Yes Algorithm 1 Bound of Naive Scaling Algorithm 2 Bound of Entropy-Consistent Scaling
Open Source Code Yes The code is https: //github.com/forever208/DCTdiff.
Open Datasets Yes The datasets include CIFAR-10 (Krizhevsky et al., 2009), Celeb A 64 (Liu et al., 2015), Image Net 64 (Chrabaszcz et al., 2017), FFHQ 128, FFHQ 256, FFHQ 512 (Karras et al., 2019) and AFHQ 512 (Choi et al., 2020).
Dataset Splits No The paper uses common datasets like CIFAR-10, CelebA, ImageNet, FFHQ, and AFHQ, and reports FID-50k, which implies comparing generated samples against real samples. However, it does not explicitly state the training, validation, and test splits (e.g., percentages or specific counts) used for the model training process on these datasets, nor does it explicitly reference standard splits for its experimental setup.
Hardware Specification Yes We generate 10k samples using one A100 GPU.
Software Dependencies No The paper mentions using frameworks like UVi T and Di T but does not provide specific version numbers for these or other key software dependencies such as programming languages (e.g., Python) or libraries (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We list the model and training parameters in Table 9 and Table 10 where the former compares UVi T and DCTdiff (inherited from UVi T) and the latter compares Di T and DCTdiff (inherited from Di T). We use the default training settings from UVi T and Di T without any change. Regarding the choice of DCTdiff parameters, we find that the fixed τ = 98 used for Entropy-Consistent Scaling is effective on all datasets, possibly due to the statistical consistency of image frequency distributions. The block size B and SNR Scaling factor c only depend on the image resolution, one can refer to Table 9 to determine B and c given a new dataset. Finally, the frequency elimination parameter m can be calculated from Eq. (6). Table 9. Training and network parameters of UVi T and DCTdiff on different datasets.