reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiffNat : Exploiting the Kurtosis Concentration Property for Image quality improvement

Authors: Aniket Roy, Maitreya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the proposed approach on four diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, (3) image super-resolution, and (4) blind face-restoration. Integrating the proposed KC loss and perceptual guidance has improved the perceptual quality in all these tasks in terms of FID, MUSIQ score, and user evaluation. Code: https://github.com/aniket004/Diff Nat.git.
Researcher Affiliation	Collaboration	Aniket Roy EMAIL Johns Hopkins University Maitreya Suin EMAIL Samsung AI Center Toronto Anshul Shah EMAIL Johns Hopkins University Ketul Shah EMAIL Johns Hopkins University Jiang Liu EMAIL AMD Rama Chellappa EMAIL Johns Hopkins University
Pseudocode	Yes	Algorithm 1: Kurtosis Concentration loss Input: Diffusion model (fθ), training images (x), condition vector (c) Output: KC loss LKC 1. ϵ N(0, I) ; // Sample random noise 2. xgen = fθ(x, ϵ, c) ; // Generate image 3. ggen,1, ggen,2, ggen,3, .. = DWT(xgen) ; // Wavelet decomposed images 4. LKC = Ex,c,ϵ[max(κ{ggen,i}) min(κ{ggen,i})] ; // Compute the KC loss Algorithm 2: Perceptual Guidance Input: Base diffusion model (θB), Diffusion model trained with KC (θP ), prompt (c), guidance scale (γ) Output: output image (x0) x T = N(0, I) for t in T, T-1, .., 1 do ...
Open Source Code	Yes	Code: https://github.com/aniket004/Diff Nat.git.
Open Datasets	Yes	We investigate and experimentally verify this property for natural images on large datasets, e.g., FFHQ dataset (Fig. 9(c)), Dreambooth dataset, Oxford-flowers dataset (in Appendix). ... We experimented with the Oxford-flowers Nilsback & Zisserman (2006), Celeb A-faces Zhang et al. (2020), Celeb AHQ Karras et al. (2017), Stanford-Dogs Khosla et al. (2011) and Stanford-Cars Krause et al. (2013) datasets... For training, we use the standard FFHQ dataset Karras et al. (2017)...
Dataset Splits	Yes	We evaluate randomly sampled 3000 images from Celeb A-Test dataset Karras et al. (2017) under the same 2, 4 and 8-SR setting in Tab. 4, Tab. 3 and Tab. 5 respectively. ... and evaluate on a subset of Celeb-A test set with a resolution of 256x256.
Hardware Specification	Yes	The time and space complexity of CFG and PG for SDXL in A5000 machine (single image inference) are presented in Tab. 10. ... The experiments for Dreambooth, Custom diffusion, DDPM have been performed on a single A5000 machine with 24GB GPU. We have performed guided diffusion (GD) and latent diffusion (LD) experiments on a server of 8 24GB A5000 GPUs.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Table 13: Hyperparameters Coefficient of Lrecon 1 Coefficient of Lprior 1 Coefficient of LKC 1 Perceptual guidance scale 1.001 Learning rate 10 5 Batch size (Dreambooth, Custom diffusion) 8 Batch size (DDPM) 125 Batch size (GD) 16 Batch size (LD) 9 Text-to-image diffusion model Stable Diffusion-v1 Rombach et al. (2022) Number of class prior images (Dreambooth, Custom diffusion ) 10 Number of DWT components 25 DWT filter Daubechies