LGDM: Latent Guidance in Diffusion Models for Perceptual Evaluations

Authors: Shreshth Saini, Ru-Ling Liao, Yan Ye, Alan Bovik

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that these hyperfeatures exhibit high correlation with human perception in IQA tasks. Our method can be applied to any existing pretrained latent diffusion model and is straightforward to integrate. To the best of our knowledge, this paper is the first work on guiding diffusion model with perceptual features for NR-IQA. Extensive experiments on IQA datasets show that our method, LGDM, achieves state-ofthe-art performance, underscoring the superior generalization capabilities of diffusion models for NR-IQA tasks.
Researcher Affiliation Collaboration 1Laboratory for Image and Video Engineering (LIVE), The University of Texas at Austin, Texas, USA. 2Alibaba Group, Sunnyvale, USA. Correspondence to: Shreshth Saini <EMAIL>.
Pseudocode Yes Algorithm 1 LGDM: Latent Guidance in Diffusion Models
Open Source Code No The text discusses the source code of a third-party tool or platform that the authors used, but does not provide their own implementation code.
Open Datasets Yes To thoroughly evaluate the effectiveness of our proposed method, we conducted extensive experiments on ten publicly available and well-recognized IQA datasets, covering synthetic distortions, authentic distortions, and the latest AI-generated content (AIGC). These datasets are summarized in Table 8 (Appendix D).
Dataset Splits Yes Following (Saha et al., 2023; Madhusudana et al., 2022), we split each dataset into training, validation, and test sets (70%, 10%, and 20%, respectively), using source image-based splits to prevent content overlap.
Hardware Specification Yes All experiments were conducted on an NVIDIA A100 GPU using Py Torch.
Software Dependencies Yes For LDM, we use the widely adopted Stable Diffusion v1.5 (Rombach et al., 2022), pretrained on the LAION-5B dataset (Schuhmann et al., 2022). For text conditioning, we use an empty string as prompt. We run 10 DDIM steps, with t within the range (0, 100] and set the hyperparameters ζ1 and ζ2 to 1 and 0.2, respectively. For regression, we use a small neural network with two hidden layers. We use Pearson Linear Correlation Coefficient (PLCC) and Spearman s Rank Order Correlation Coefficient (SRCC) as evaluation metrics. The impact of the choice of ψp is discussed in detail in the ablation study and Appendix D. All experiments were conducted on an NVIDIA A100 GPU using Py Torch. Additional implementation details are provided in the Appendix C.
Experiment Setup Yes We run 10 DDIM steps, with t within the range (0, 100] and set the hyperparameters ζ1 and ζ2 to 1 and 0.2, respectively. For regression, we use a small neural network with two hidden layers.