reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PnP-Flow: Plug-and-Play Image Restoration with Flow Matching

Authors: Ségolène Martin, Anne Gagneux, Paul Hagemann, Gabriele Steidl

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate its performance on denoising, super-resolution, deblurring, and inpainting tasks, demonstrating superior results compared to existing Pn P algorithms and Flow Matching based state-of-the-art methods. Code available at https://github.com/annegnx/Pn P-Flow. (...) We evaluate all methods on two datasets: Celeb A (Yang et al., 2015), with images resized to 128 128, and AFHQ-Cat, a subset of the Animal Faces HQ dataset (Choi et al., 2020) focused on the cat class, with images resized to 256 256. (...) We report benchmark results for all methods across several restoration tasks, measuring average PSNR and Structural Similarity (SSIM) on 100 test images. To ensure reproducibility, all experiments were seeded. Results are presented in Table 1 for Celeb A and Table 2 for AFHQ-Cat.
Researcher Affiliation	Academia	1 Technische Universit at Berlin 2 ENS de Lyon, CNRS, Universit e Claude Bernard Lyon 1, Inria, LIP, UMR 5668
Pseudocode	Yes	Algorithm 1: FBS (...) Algorithm 2: Pn P-FBS (...) Algorithm 3: Pn P Flow Matching
Open Source Code	Yes	Code available at https://github.com/annegnx/Pn P-Flow. (...) The code with all benchmark methods is available at: https://github.com/annegnx/Pn P-Flow and is now included the Deep Inv library1.
Open Datasets	Yes	We evaluate all methods on two datasets: Celeb A (Yang et al., 2015), with images resized to 128 128, and AFHQ-Cat, a subset of the Animal Faces HQ dataset (Choi et al., 2020) focused on the cat class, with images resized to 256 256. All images are normalized to the range [ 1, 1]. (...) For this, we use the MNIST dataset (Le Cun et al., 1998), rescaling each image to lie on the simplex, and train a Flow Matching model.
Dataset Splits	Yes	For Celeb A, we use the standard training, validation, and test splits. For AFHQ-Cat, as no validation split is provided, we randomly select 32 images from the test set to create a validation set.
Hardware Specification	Yes	All experiments in this section are conducted on a single NVIDIA RTX 6000 Ada Generation with 48GB RAM.
Software Dependencies	No	The paper mentions the 'Deep Inv library' and the 'Adam optimizer' but does not specify version numbers for any software components. For example, 'implemented in the Deep Inv library (Tachella et al., 2023)' and 'Adam optimizer (Kingma & Ba, 2017)' do not provide specific software versions for reproducibility.
Experiment Setup	Yes	The training parameters were a learning rate of 10 4, 200 epochs with a batch size of 128 for Celeb A, and 400 epochs with a batch size of 64 for AFHQ-Cat. (...) Our proposed method has two hyper-parameters: the exponent α in the learning rate schedule γn = (1 tn)α, and the number of uniformly spaced time steps was set to N = 100 for most experiments. (...) The optimal values identified for each dataset and problem scenario are detailed in Appendix A.8.