DiffNat : Exploiting the Kurtosis Concentration Property for Image quality improvement

Authors: Aniket Roy, Maitreya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the proposed approach on four diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, (3) image super-resolution, and (4) blind face-restoration. Integrating the proposed KC loss and perceptual guidance has improved the perceptual quality in all these tasks in terms of FID, MUSIQ score, and user evaluation. Code: https://github.com/aniket004/Diff Nat.git.
Researcher Affiliation Collaboration Aniket Roy EMAIL Johns Hopkins University Maitreya Suin EMAIL Samsung AI Center Toronto Anshul Shah EMAIL Johns Hopkins University Ketul Shah EMAIL Johns Hopkins University Jiang Liu EMAIL AMD Rama Chellappa EMAIL Johns Hopkins University
Pseudocode Yes Algorithm 1: Kurtosis Concentration loss Input: Diffusion model (fθ), training images (x), condition vector (c) Output: KC loss LKC 1. ϵ N(0, I) ; // Sample random noise 2. xgen = fθ(x, ϵ, c) ; // Generate image 3. ggen,1, ggen,2, ggen,3, .. = DWT(xgen) ; // Wavelet decomposed images 4. LKC = Ex,c,ϵ[max(κ{ggen,i}) min(κ{ggen,i})] ; // Compute the KC loss Algorithm 2: Perceptual Guidance Input: Base diffusion model (θB), Diffusion model trained with KC (θP ), prompt (c), guidance scale (γ) Output: output image (x0) x T = N(0, I) for t in T, T-1, .., 1 do ...
Open Source Code Yes Code: https://github.com/aniket004/Diff Nat.git.
Open Datasets Yes We investigate and experimentally verify this property for natural images on large datasets, e.g., FFHQ dataset (Fig. 9(c)), Dreambooth dataset, Oxford-flowers dataset (in Appendix). ... We experimented with the Oxford-flowers Nilsback & Zisserman (2006), Celeb A-faces Zhang et al. (2020), Celeb AHQ Karras et al. (2017), Stanford-Dogs Khosla et al. (2011) and Stanford-Cars Krause et al. (2013) datasets... For training, we use the standard FFHQ dataset Karras et al. (2017)...
Dataset Splits Yes We evaluate randomly sampled 3000 images from Celeb A-Test dataset Karras et al. (2017) under the same 2, 4 and 8-SR setting in Tab. 4, Tab. 3 and Tab. 5 respectively. ... and evaluate on a subset of Celeb-A test set with a resolution of 256x256.
Hardware Specification Yes The time and space complexity of CFG and PG for SDXL in A5000 machine (single image inference) are presented in Tab. 10. ... The experiments for Dreambooth, Custom diffusion, DDPM have been performed on a single A5000 machine with 24GB GPU. We have performed guided diffusion (GD) and latent diffusion (LD) experiments on a server of 8 24GB A5000 GPUs.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Table 13: Hyperparameters Coefficient of Lrecon 1 Coefficient of Lprior 1 Coefficient of LKC 1 Perceptual guidance scale 1.001 Learning rate 10 5 Batch size (Dreambooth, Custom diffusion) 8 Batch size (DDPM) 125 Batch size (GD) 16 Batch size (LD) 9 Text-to-image diffusion model Stable Diffusion-v1 Rombach et al. (2022) Number of class prior images (Dreambooth, Custom diffusion ) 10 Number of DWT components 25 DWT filter Daubechies