Video Diffusion Models Are Strong Video Inpainter
Authors: Minhyeok Lee, Suhwan Cho, Chajin Shin, Jungho Lee, Sunghun Yang, Sangyoun Lee
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through various comparative experiments, we demonstrate that the proposed model can robustly handle diverse inpainting types with high quality. We conduct various comparative experiments to demonstrate that the proposed FFF-VDI outperforms previous methods in both video completion and object removal across different scenarios. In Table 1, we quantitatively compare our proposed FFF-VDI with state-of-the-art methods on the You Tube-VOS and DAVIS datasets. |
| Researcher Affiliation | Academia | Yonsei University EMAIL |
| Pseudocode | No | The paper describes the proposed approach through textual descriptions and architectural diagrams (Figure 2, Figure 3) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about providing access to source code or a link to a code repository. |
| Open Datasets | Yes | To fairly compare previous state-of-the-art models with the proposed FFF-VDI, we use the You Tube-VOS (Xu et al. 2018) training set as the training data. Additionally, for model evaluation, we use the widely known You Tube VOS (Xu et al. 2018) and DAVIS (Perazzi et al. 2016) test sets as evaluation datasets. |
| Dataset Splits | Yes | To fairly compare previous state-of-the-art models with the proposed FFF-VDI, we use the You Tube-VOS (Xu et al. 2018) training set as the training data. Additionally, for model evaluation, we use the widely known You Tube VOS (Xu et al. 2018) and DAVIS (Perazzi et al. 2016) test sets as evaluation datasets. The You Tube-VOS test set consists of 508 video clips, and the DAVIS test set consists of 90 video clips. For the DAVIS test set, we follow the approach of Pro Painter (Zhou et al. 2023) and E2FGVI (Li et al. 2022), using 50 video clips for evaluation. |
| Hardware Specification | Yes | Our method is implemented using the Py Torch framework and trained on four NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions software like PyTorch and RAFT (Teed and Deng 2020) but does not provide specific version numbers for these key software components, which is required for a reproducible description. |
| Experiment Setup | Yes | In this paper, we set the batch size to 4, the initial learning rate to 10 5, and train for a total of 100,000 iterations with Adam (Kingma and Ba 2014) optimizer. |