InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
Authors: Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments substantiate the efficacy of our framework across a diverse array of challenging tasks and datasets, unveiling the compelling efficacy and efficiency of Insta Revive in delivering high-quality and visually appealing results. Code is available at https://github.com/Eternal Evan/Insta Revive. 4 EXPERIMENTS |
| Researcher Affiliation | Academia | 1Department of Automation, Tsinghua University 2Tsinghua Shenzhen International Graduate School, Tsinghua University |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Eternal Evan/Insta Revive. |
| Open Datasets | Yes | For blind face restoration (BFR), we utilize the Flickr-Faces-HQ (FFHQ) (Karras et al., 2019), which encompasses 70,000 high-resolution images. We resize them to 512 512 to match the input scale of the diffusion model. For evaluation, we leverage the widely used Celeb A-Test dataset (Liu et al., 2015), which consists of 3,000 synthetic HQ-LQ pairs. Additionally, to further validate the effectiveness on real-world data, we employ 2 wild face datasets, LFW-Test (Wang et al., 2021b) and WIDER-Test (Zhou et al., 2022) which contain face images with varying degrees of degradation. For blind image super-resolution, we train our framework using the large-scale Image Net dataset (Deng et al., 2009) and evaluate on the Real SR (Cai et al., 2019) and Real Set65 (Yue et al., 2023). |
| Dataset Splits | No | The paper mentions specific datasets used for training (e.g., FFHQ, ImageNet) and evaluation (e.g., Celeb A-Test, Real SR, Real Set65), often providing the number of images in the test sets. However, it does not explicitly provide the training/validation/test split percentages or sample counts for the primary datasets (e.g., how FFHQ or ImageNet were split for training and validation by the authors for reproducibility of their specific splits), only referring to evaluation datasets or |
| Hardware Specification | Yes | For BFR and BSR, we employ the high-order degradation model in (Wang et al., 2021c), training for 25K and 35K steps with 4 Nvidia A800 GPUs, respectively. To provide a clear comparison of model parameters and inference time (evaluated on an Nvidia 3090 GPU), we present Table 5 |
| Software Dependencies | No | The paper mentions the use of specific models like BLIP, IP-Adapter, VQGAN, and optimizers like Adam W, but it does not specify the version numbers for any software, libraries, or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Both models are optimized with a batch size of 32 and a learning rate of 1e-6 using two Adam W optimizers with a weight decay of 1e-2. We initialize the generator and two score estimators by replicating the denoising transformer blocks in (Chen et al., 2023a). For BFR and BSR, we employ the high-order degradation model in (Wang et al., 2021c), training for 25K and 35K steps with 4 Nvidia A800 GPUs, respectively. For face cartoonization, we finetune our pre-trained BFR model for an additional 10K steps. We set the KL term weight to 1.0 and the control factor to 1.5 for optimal performance. |