DiffNat : Exploiting the Kurtosis Concentration Property for Image quality improvement
Authors: Aniket Roy, Maitreya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed approach on four diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, (3) image super-resolution, and (4) blind face-restoration. Integrating the proposed KC loss and perceptual guidance has improved the perceptual quality in all these tasks in terms of FID, MUSIQ score, and user evaluation. Code: https://github.com/aniket004/Diff Nat.git. |
| Researcher Affiliation | Collaboration | Aniket Roy EMAIL Johns Hopkins University Maitreya Suin EMAIL Samsung AI Center Toronto Anshul Shah EMAIL Johns Hopkins University Ketul Shah EMAIL Johns Hopkins University Jiang Liu EMAIL AMD Rama Chellappa EMAIL Johns Hopkins University |
| Pseudocode | Yes | Algorithm 1: Kurtosis Concentration loss Input: Diffusion model (fθ), training images (x), condition vector (c) Output: KC loss LKC 1. ϵ N(0, I) ; // Sample random noise 2. xgen = fθ(x, ϵ, c) ; // Generate image 3. ggen,1, ggen,2, ggen,3, .. = DWT(xgen) ; // Wavelet decomposed images 4. LKC = Ex,c,ϵ[max(κ{ggen,i}) min(κ{ggen,i})] ; // Compute the KC loss Algorithm 2: Perceptual Guidance Input: Base diffusion model (θB), Diffusion model trained with KC (θP ), prompt (c), guidance scale (γ) Output: output image (x0) x T = N(0, I) for t in T, T-1, .., 1 do ... |
| Open Source Code | Yes | Code: https://github.com/aniket004/Diff Nat.git. |
| Open Datasets | Yes | We investigate and experimentally verify this property for natural images on large datasets, e.g., FFHQ dataset (Fig. 9(c)), Dreambooth dataset, Oxford-flowers dataset (in Appendix). ... We experimented with the Oxford-flowers Nilsback & Zisserman (2006), Celeb A-faces Zhang et al. (2020), Celeb AHQ Karras et al. (2017), Stanford-Dogs Khosla et al. (2011) and Stanford-Cars Krause et al. (2013) datasets... For training, we use the standard FFHQ dataset Karras et al. (2017)... |
| Dataset Splits | Yes | We evaluate randomly sampled 3000 images from Celeb A-Test dataset Karras et al. (2017) under the same 2, 4 and 8-SR setting in Tab. 4, Tab. 3 and Tab. 5 respectively. ... and evaluate on a subset of Celeb-A test set with a resolution of 256x256. |
| Hardware Specification | Yes | The time and space complexity of CFG and PG for SDXL in A5000 machine (single image inference) are presented in Tab. 10. ... The experiments for Dreambooth, Custom diffusion, DDPM have been performed on a single A5000 machine with 24GB GPU. We have performed guided diffusion (GD) and latent diffusion (LD) experiments on a server of 8 24GB A5000 GPUs. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Table 13: Hyperparameters Coefficient of Lrecon 1 Coefficient of Lprior 1 Coefficient of LKC 1 Perceptual guidance scale 1.001 Learning rate 10 5 Batch size (Dreambooth, Custom diffusion) 8 Batch size (DDPM) 125 Batch size (GD) 16 Batch size (LD) 9 Text-to-image diffusion model Stable Diffusion-v1 Rombach et al. (2022) Number of class prior images (Dreambooth, Custom diffusion ) 10 Number of DWT components 25 DWT filter Daubechies |