InstaRevive: One-Step Image Enhancement via Dynamic Score Matching

Authors: Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments substantiate the efficacy of our framework across a diverse array of challenging tasks and datasets, unveiling the compelling efficacy and efficiency of Insta Revive in delivering high-quality and visually appealing results. Code is available at https://github.com/Eternal Evan/Insta Revive. 4 EXPERIMENTS
Researcher Affiliation Academia 1Department of Automation, Tsinghua University 2Tsinghua Shenzhen International Graduate School, Tsinghua University
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Eternal Evan/Insta Revive.
Open Datasets Yes For blind face restoration (BFR), we utilize the Flickr-Faces-HQ (FFHQ) (Karras et al., 2019), which encompasses 70,000 high-resolution images. We resize them to 512 512 to match the input scale of the diffusion model. For evaluation, we leverage the widely used Celeb A-Test dataset (Liu et al., 2015), which consists of 3,000 synthetic HQ-LQ pairs. Additionally, to further validate the effectiveness on real-world data, we employ 2 wild face datasets, LFW-Test (Wang et al., 2021b) and WIDER-Test (Zhou et al., 2022) which contain face images with varying degrees of degradation. For blind image super-resolution, we train our framework using the large-scale Image Net dataset (Deng et al., 2009) and evaluate on the Real SR (Cai et al., 2019) and Real Set65 (Yue et al., 2023).
Dataset Splits No The paper mentions specific datasets used for training (e.g., FFHQ, ImageNet) and evaluation (e.g., Celeb A-Test, Real SR, Real Set65), often providing the number of images in the test sets. However, it does not explicitly provide the training/validation/test split percentages or sample counts for the primary datasets (e.g., how FFHQ or ImageNet were split for training and validation by the authors for reproducibility of their specific splits), only referring to evaluation datasets or
Hardware Specification Yes For BFR and BSR, we employ the high-order degradation model in (Wang et al., 2021c), training for 25K and 35K steps with 4 Nvidia A800 GPUs, respectively. To provide a clear comparison of model parameters and inference time (evaluated on an Nvidia 3090 GPU), we present Table 5
Software Dependencies No The paper mentions the use of specific models like BLIP, IP-Adapter, VQGAN, and optimizers like Adam W, but it does not specify the version numbers for any software, libraries, or programming languages used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Both models are optimized with a batch size of 32 and a learning rate of 1e-6 using two Adam W optimizers with a weight decay of 1e-2. We initialize the generator and two score estimators by replicating the denoising transformer blocks in (Chen et al., 2023a). For BFR and BSR, we employ the high-order degradation model in (Wang et al., 2021c), training for 25K and 35K steps with 4 Nvidia A800 GPUs, respectively. For face cartoonization, we finetune our pre-trained BFR model for an additional 10K steps. We set the KL term weight to 1.0 and the control factor to 1.5 for optimal performance.