Disentangled Motion Modeling for Video Frame Interpolation
Authors: Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments validate the effectiveness and efficiency of our proposed training scheme and architecture, demonstrating superior performance across various benchmarks in terms of perceptual metrics... Quantitative Results Tables 1 and 2 present our quantitative results across four benchmark datasets. Mo Mo achieves state-of-the-art on all four subsets of SNU-FILM... We conduct ablation studies to verify the effects of our design choices. |
| Researcher Affiliation | Academia | 1Interdisciplinary Program in AI, Seoul National University 2Department of Electrical and Computer Engineering, Seoul National University 3School of Computer Science and Engineering, Soongsil University 4AIIS, ASRI and INMC, Seoul National University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the proposed Mo Mo framework, its two-stage training process, and architectural details in sections 3.1, 3.2, and 3.3, but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We train our model on the Vimeo90k dataset (Xue et al. 2019), using random 256 256 crops with augmentations like 90 rotation, flipping, and frame order reversing. We evaluate on well-known VFI benchmarks: Vimeo90k (Xue et al. 2019), SNU-FILM (Choi et al. 2020), Middlebury (others-set) (Baker et al. 2011), and Xiph (Montgomery and Lars 1994; Niklaus and Liu 2020), chosen for their broad motion diversity and magnitudes. |
| Dataset Splits | Yes | We train our model on the Vimeo90k dataset (Xue et al. 2019), using random 256 256 crops with augmentations like 90 rotation, flipping, and frame order reversing. We evaluate on well-known VFI benchmarks: Vimeo90k (Xue et al. 2019), SNU-FILM (Choi et al. 2020), Middlebury (others-set) (Baker et al. 2011), and Xiph (Montgomery and Lars 1994; Niklaus and Liu 2020), chosen for their broad motion diversity and magnitudes. |
| Hardware Specification | Yes | Runtime tests on a NVIDIA 32GB V100 GPU for 256 448 resolution frames averaged over 100 iterations reveal that our Convex-Up U-Net processes frames in approximately 145.49 ms each, achieving a 4.15 speedup over the standard U-Net and an 70 faster inference speed than the LDMVFI baseline. |
| Software Dependencies | No | We adopt pre-trained RAFT (Teed and Deng 2020) for optical flow model F... We use the standard timestep-conditioned U-Net architecture (UNet2DModel) from the diffusers library (von Platen et al. 2022)... The paper mentions specific software libraries and models like RAFT and the diffusers library, but it does not provide explicit version numbers for these components, which is required for reproducible software details. |
| Experiment Setup | No | Implementation Details We train our model on the Vimeo90k dataset (Xue et al. 2019), using random 256 256 crops with augmentations like 90 rotation, flipping, and frame order reversing. We recommend the reader to refer to the Appendix for further details. The paper describes the dataset used, cropping, and augmentations, along with the composition of the loss function Ls = λ1L1 + λp Lp + λGLG, but it does not provide specific hyperparameter values such as learning rates, batch sizes, optimizer settings, or the weights (λ values) for the loss components in the main text. |