Solving Video Inverse Problems Using Image Diffusion Models

Authors: Taesung Kwon, Jong Chul YE

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct thorough comparison studies to demonstrate the efficacy of the proposed method in addressing spatio-temporal degradations. Specifically, we consider two types of loss functions for video inverse problems: ... We present the quantitative results of the temporal degradation tasks in Table 1. The table shows that the proposed method outperforms the baseline methods by large margins in all metrics. ... Fig. 4 shows the qualitative reconstruction results for temporal degradations A. ... The results of the spatio-temporal degradations are presented in Table 2 and Fig. 5. ... 4.2 ABLATION STUDY
Researcher Affiliation Academia Taesung Kwon1, Jong Chul Ye2 1 Dept. of Bio & Brain Engineering, KAIST 2 Kim Jae Chul Graduate School of AI, KAIST EMAIL
Pseudocode Yes Algorithm 1 Video inverse problem solver using 2D diffusion models
Open Source Code Yes Project page: https://svi-diffusion.github.io/ ... The detailed data preprocessing code and the preprocessed Numpy files have all been open-sourced.
Open Datasets Yes We conduct our experiments on the DAVIS dataset (Perazzi et al., 2016; Pont-Tuset et al., 2017)... The Go Pro dataset consists of 240 fps videos captured using a Go Pro camera... (Nah et al., 2017). ... additional experiments on a high-frame-rate dataset (collected from Pexels1).
Dataset Splits Yes We conducted every experiment using train/val sets of DAVIS 2017 dataset (Perazzi et al., 2016; Pont-Tuset et al., 2017). ... A total of 338 video samples were used for evaluation.
Hardware Specification Yes Memory issues exist when performing DPS sampling more than 5 batch sizes in NVIDIA Ge Force RTX 4090 GPU with VRAM 24GB. ... With a single RTX 4090 GPU (24GB VRAM), it can reconstruct a 32-frame video at the same resolution. ... feasible on GPUs like the GTX 1080Ti or RTX 2080Ti (11GB VRAM).
Software Dependencies No The pre-trained unconditional 256 256 image diffusion model from ADM (Dhariwal & Nichol, 2021) is used directly without fine-tuning and additional networks. ... The resizing was performed using the resize function from the cv2 library.
Experiment Setup Yes For all proposed methods, we employ l = 5, η = 0.15 for 20 NFE in temporal degradation tasks, and l = 5, η = 0.8 for 100 NFE in spatio-temporal degradation tasks unless specified otherwise. ... For Diffusion MBIR, ... (ρ, λ) = (0.1, 0.001) for temporal degradation, and (ρ, λ) = (0.01, 0.01) for spatio-temporal degradation. ... For DPS, ... ζ = 30 for both temporal degradation and spatio-temporal degradation. ... ADMM-TV ... The outer iterations of ADMM are solved with 30 iterations and the inner iterations of CG are solved with 20 iterations... The parameter is set to (ρ, λ) = (1, 0.001). We set initial X as zeros.