reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

Authors: Serin Yang, Taesung Kwon, Jong Chul YE

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted a comparative study with four different keyframe interpolation baselines, including FILM (Reda et al., 2019), a conventional flow-based frame interpolation method, and three frame interpolation methods based on video diffusion models: TRF (Feng et al., 2024), Dynami Crafter (Xing et al., 2023), and Generative Inbetweening (Wang et al., 2024). We conducted these studies using the official implementations with default values, except for TRF, which has not been open-sourced yet. Qualitative evaluation. As illustrated in Fig. 4, our model clearly outperforms the other methods in terms of motion consistency and identity preservation. Quantitative evaluation. For quantitative evaluation, we used LPIPS (Zhang et al., 2018) and FID (Heusel et al., 2017) to assess the quality of the generated frames, and FVD (Unterthiner et al., 2019) to evaluate the overall quality of the generated videos. As shown in Table 1, our method surpasses the other baselines in terms of fidelity.
Researcher Affiliation	Academia	Serin Yang1 , Taesung Kwon2 , Jong Chul Ye1 1Kim Jaechul Graduate School of AI, KAIST 2Dept. of Bio & Brain Engineering, KAIST EMAIL
Pseudocode	Yes	The detailed algorithm is provided in Algorithm 1. The vanilla bidirectional sampling can be implemented by removing DDS guidance (orange) and replacing the CFG++ update (blue) with a traditional CFG update. The detailed algorithm of the vanilla bidirectional sampling is provided in Appendix A. Algorithm 1 Vi Bi DSampler
Open Source Code	No	Project page: https://vibidsampler.github.io/ (Unofficial implementation: https://github.com/Ying Huan-Chen/Time-Reversal - for TRF, not for our method)
Open Datasets	Yes	Dataset. The high-resolution (1080p) video datasets used for evaluation are sourced from the DAVIS dataset (Pont-Tuset et al., 2017) and the Pexels dataset1. For the DAVIS dataset, we preprocessed 100 videos into 100 video-keyframe pairs, with each video consisting of 25 frames. This dataset includes a wide range of large and varied motions, such as surfing, dancing, driving, and airplane flying. For the Pexels dataset, we collected 45 videos, primarily featuring scene motions, natural movements, directional animal movements, and sports actions. We used the first and last frames from each video as keyframes for our evaluation. 1https://www.pexels.com/
Dataset Splits	No	The paper mentions preprocessing 100 videos from DAVIS and collecting 45 videos from Pexels, but does not specify training/test/validation splits for these datasets for the experimental evaluation. It states 'We used the first and last frames from each video as keyframes for our evaluation' but not how the datasets themselves were partitioned for evaluation.
Hardware Specification	Yes	On a single 3090 GPU, our method can interpolate 25 frames at 1024 576 resolution in just 195 seconds, establishing it as a leading solution for keyframe interpolation. All evaluations were performed on a single NVIDIA RTX 3090.
Software Dependencies	No	The paper mentions using specific schedulers and diffusion models like "Euler scheduler" and "Stable Video Diffusion (SVD)" within the "EDM-framework", but it does not provide specific version numbers for these software components or any programming languages or libraries (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For the sampling process, we used the Euler scheduler with 25 timesteps for both forward and backward sampling. The motion bucket ID was fixed at 127, and the decoding frame number was set to 4 due to memory limitations on an NVIDIA RTX 3090 GPU. All other parameters followed the default settings from SVD. Since micro-condition fps is sensitive to the data, we applied a lower fps for cases with large motion and a higher fps for cases with smaller motion. Figure 6: Effect of CFG++ guidance scale. The rows, from top to bottom, correspond to the CFG++ scales of 0.6, 0.8, and 1.0.