Diffusion State-Guided Projected Gradient for Inverse Problems

Authors: Rayhan Zirvi, Bahareh Tolooshams, anima anandkumar

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Diff State Grad significantly improves the performance of state-of-the-art (SOTA) methods, especially in challenging tasks such as phase retrieval and high dynamic range reconstruction. For example, Diff State Grad improves the PSNR of Re Sample (Song et al., 2023a) from 27.61(8.07) to 31.19(4.33) for phase retrieval, reporting mean (std). Our experiments cover a wide range of linear inverse problems of box inpainting, random inpainting, Gaussian deblur, motion deblur, and super-resolution (Tables 3 and 4) and nonlinear inverse problems of phase retrieval, nonlinear deblur, and high dynamic range (HDR) (Table 3) for image restoration tasks.
Researcher Affiliation Academia Rayhan Zirvi , Bahareh Tolooshams , & Anima Anandkumar Computing and Mathematical Sciences California Institute of Technology EMAIL
Pseudocode Yes Algorithm 1 Diffusion State-Guided Projected Gradient (Diff State Grad) for Latent Diffusion-based Inverse Problems (Image Restoration Tasks)
Open Source Code Yes Our code is available at https://github.com/Anima-Lab/Diff State Grad.
Open Datasets Yes We demonstrate the effectiveness of Diff State Grad on two datasets: a) the FFHQ 256 256 validation dataset (Karras et al., 2021), and b) the Image Net 256 256 validation dataset (Deng et al., 2009). For pixel-based experiments, we use (i) the pretrained diffusion model from (Chung et al., 2023) for the FFHQ dataset, and (ii) the pre-trained model from (Dhariwal & Nichol, 2021) for the Image Net dataset. For latent diffusion experiments, we use (i) the unconditional LDM-VQ-4 model trained on FFHQ (Rombach et al., 2022) for the FFHQ dataset, and (ii) the Stable Diffusion v1.5 (Rombach et al., 2022) model for the Image Net dataset. We also conduct an additional experiment for Magnetic Resonance Imaging (MRI) (see Appendix E). ... We utilize the Compressed Sensing Generative Model (Jalal et al., 2021)... The unconditional diffusion model was trained on T2-weighted brain datasets from the NYU fast MRI dataset (Zbontar et al., 2018; Knoll et al., 2020).
Dataset Splits Yes For evaluation, we sample a fixed set of 100 images from the FFHQ and Image Net validation sets. Images are normalized to the range [0, 1]. We use the default settings for all experiments (see Appendix C for more details). For linear inverse problems, we consider (1) box inpainting, (2) random inpainting, (3) Gaussian deblur, (4) motion deblur, and (5) super-resolution. In the box inpainting task, a random 128 128 box is used, while the random inpainting task employs a 70% random mask. Gaussian and motion deblurring tasks utilize kernels of size 61 61, with standard deviations of 3.0 and 0.5, respectively. For super-resolution, images are downscaled by a factor of 4 using a bicubic resizer. For nonlinear inverse problems, we consider (1) phase retrieval, (2) nonlinear deblur, and (3) high dynamic range (HDR). For phase retrieval, we use an oversampling rate of 2.0, and due to the instability and nonuniqueness of reconstruction, we adopt the strategy from DPS (Chung et al., 2023) and DAPS (Zhang et al., 2024), generating four separate reconstructions and reporting the best result. ... The unconditional diffusion model was trained on T2-weighted brain datasets from the NYU fast MRI dataset (Zbontar et al., 2018; Knoll et al., 2020), and the reported results were averaged over 30 test examples (reporting avg (std)).
Hardware Specification Yes We conduct these experiments on the box inpainting task using an NVIDIA Ge Force RTX 4090 GPU with 24GB of VRAM. Each method is run with its default settings on a set of 100 images from FFHQ 256 256, and we measure the average runtime in seconds per image.
Software Dependencies No The paper refers to existing open-source implementations for baselines (PSLD, Re Sample, DPS, DAPS) but does not provide specific version numbers for core software dependencies like Python, PyTorch, or CUDA used in their own implementation of Diff State Grad.
Experiment Setup Yes For all main experiments across all four methods, we use the variance retention threshold τ = 0.99. For all experiments involving PSLD, DPS, and DAPS, we perform the Diff State Grad projection step every iteration (P = 1). For all experiments involving Re Sample, we perform the step every five iterations (P = 5). ... For linear inverse problems, we consider (1) box inpainting, (2) random inpainting, (3) Gaussian deblur, (4) motion deblur, and (5) super-resolution. In the box inpainting task, a random 128 128 box is used, while the random inpainting task employs a 70% random mask. Gaussian and motion deblurring tasks utilize kernels of size 61 61, with standard deviations of 3.0 and 0.5, respectively. For super-resolution, images are downscaled by a factor of 4 using a bicubic resizer. For nonlinear inverse problems, we consider (1) phase retrieval, (2) nonlinear deblur, and (3) high dynamic range (HDR). For phase retrieval, we use an oversampling rate of 2.0... For nonlinear deblur, we use the default setting from (Tran et al., 2021). For HDR, we use a scale factor of 2.