reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Video Diffusion Models Are Strong Video Inpainter

Authors: Minhyeok Lee, Suhwan Cho, Chajin Shin, Jungho Lee, Sunghun Yang, Sangyoun Lee

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through various comparative experiments, we demonstrate that the proposed model can robustly handle diverse inpainting types with high quality. We conduct various comparative experiments to demonstrate that the proposed FFF-VDI outperforms previous methods in both video completion and object removal across different scenarios. In Table 1, we quantitatively compare our proposed FFF-VDI with state-of-the-art methods on the You Tube-VOS and DAVIS datasets.
Researcher Affiliation	Academia	Yonsei University EMAIL
Pseudocode	No	The paper describes the proposed approach through textual descriptions and architectural diagrams (Figure 2, Figure 3) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing access to source code or a link to a code repository.
Open Datasets	Yes	To fairly compare previous state-of-the-art models with the proposed FFF-VDI, we use the You Tube-VOS (Xu et al. 2018) training set as the training data. Additionally, for model evaluation, we use the widely known You Tube VOS (Xu et al. 2018) and DAVIS (Perazzi et al. 2016) test sets as evaluation datasets.
Dataset Splits	Yes	To fairly compare previous state-of-the-art models with the proposed FFF-VDI, we use the You Tube-VOS (Xu et al. 2018) training set as the training data. Additionally, for model evaluation, we use the widely known You Tube VOS (Xu et al. 2018) and DAVIS (Perazzi et al. 2016) test sets as evaluation datasets. The You Tube-VOS test set consists of 508 video clips, and the DAVIS test set consists of 90 video clips. For the DAVIS test set, we follow the approach of Pro Painter (Zhou et al. 2023) and E2FGVI (Li et al. 2022), using 50 video clips for evaluation.
Hardware Specification	Yes	Our method is implemented using the Py Torch framework and trained on four NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions software like PyTorch and RAFT (Teed and Deng 2020) but does not provide specific version numbers for these key software components, which is required for a reproducible description.
Experiment Setup	Yes	In this paper, we set the batch size to 4, the initial learning rate to 10 5, and train for a total of 100,000 iterations with Adam (Kingma and Ba 2014) optimizer.