reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conformal Bounds on Full-Reference Image Quality for Imaging Inverse Problems

Authors: Jeffrey Wen, Rizwan Ahmad, Philip Schniter

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our approach on image denoising and accelerated magnetic resonance imaging (MRI) problems. Code is available at https://github.com/jwen307/quality_uq. ...We now consider two imaging inverse problems: image denoising and accelerated MRI. For each, we evaluate the proposed bounds using the PSNR, SSIM (Wang et al., 2004), LPIPS (Zhang et al., 2018), and DISTS (Ding et al., 2020a) metrics.
Researcher Affiliation	Academia	Jeffrey Wen EMAIL Department of Electrical and Computer Engineering The Ohio State University Rizwan Ahmad EMAIL Department of Biomedical Engineering The Ohio State University Philip Schniter EMAIL Department of Electrical and Computer Engineering The Ohio State University
Pseudocode	No	The paper describes methods and procedures in prose within the main text (e.g., Section 3 'Proposed approach') and does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Code is available at https://github.com/jwen307/quality_uq.
Open Datasets	Yes	For true images, we use a random subset of 4000 images from the Flickr Faces HQ (FFHQ) (Karras et al., 2019) validation dataset, to which we added white Gaussian noise of standard deviation σ = 0.75 to create the measurements y0. We utilize the non-fat-suppressed subset of the multicoil fast MRI knee dataset (Zbontar et al., 2018), yielding 17286 training images and 2188 validation images.
Dataset Splits	Yes	The first 1000 images were used to train the predictor f( ; θ) in (9) and the remaining 3000 were used for calibration and testing. For each trial t {1, . . . , T}, we randomly select 70% of the 3000 non-training samples to create the calibration set dcal[t] with indices i Ical[t], and we use the remaining 30% of the non-training samples for a test fold with indices k Itest[t]. We utilize the non-fat-suppressed subset of the multicoil fast MRI knee dataset (Zbontar et al., 2018), yielding 17286 training images and 2188 validation images. As before, we evaluate performance over T = 10 000 Monte Carlo trials with a random 70% calibration and 30% test split of the validation data.
Hardware Specification	Yes	Using a single NVIDIA V100 GPU with 32GB of memory, computing a single DDRM sample takes approximately 2.73 seconds. The E2E-Var Net takes approximately 104ms to generate a single posterior sample, while the CNF take about 1.22 seconds to generate 32 posterior samples (corresponding to c = 32) on a single NVIDIA V100.
Software Dependencies	Yes	We use the Torch Metrics (Borovec et al., 2022) package under the Apache 2.0 license to compute PSNR, SSIM, and LPIPS. To compute the quadratic program for Sec. 4.1, we use the qpsolver (Caron et al., 2024) package under a LGPL 3.0 license along with the CVXOPT (Andersen et al., 2023) package under a GNU General Public License. All models use the Py Torch (Paszke et al., 2019) framework with a custom license allowing open use. The E2E-Var Net and CNF are implemented using Py Torch Lightning (Falcon et al., 2019) under an Apache 2.0 license.
Experiment Setup	Yes	Following Kawar et al. (2022a), we run DDRM with a Denoising Diffusion Probabilistic Model (DDPM) (Ho et al., 2020) pretrained on the Celeb A-HQ dataset (Karras et al., 2018). To increase sampling diversity, we used η = 1 and ηb = 0.5 but set all other hyperparameters at their default values. The model was trained for 50 epochs with a batch size of 16 and learning rate of 0.0001 using SSIM (Wang et al., 2004) as the loss function. The model is trained for 150 epochs with batch size 8 and learning rate 0.0001.