High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion
Authors: Junhwa Hur, Charles Herrmann, Saurabh Saxena, Janne Kontkanen, Wei-Sheng Lai, Yichang Shih, Michael Rubinstein, David J. Fleet, Deqing Sun
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Hi FI excels at high-resolution images and complex repeated textures that require global context, achieving comparable or state-of-the-art performance on various benchmarks (Vimeo, Xiph, X-Test, and SEPE-8K). We further introduce a new dataset, La Mo R, that focuses on particularly challenging cases, and Hi FI significantly outperforms other baselines. Experiments Implementation details. Public benchmark evaluation Ablation study. |
| Researcher Affiliation | Academia | DF is also affiliated with the University of Toronto and the Vector Institute. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The project page https://hifi-diffusion.github.io is provided, but it is a project demonstration page and not a direct link to a source-code repository for the methodology described in the paper. |
| Open Datasets | Yes | Hi FI excels at high-resolution images and complex repeated textures that require global context, achieving comparable or state-of-the-art performance on various benchmarks (Vimeo, Xiph, X-Test, and SEPE-8K). Public benchmark evaluation We first evaluate Hi FI on three popular benchmark datasets, Vimeo-90K triplet (Xue et al. 2019), Xiph (Niklaus and Liu 2020), and X-TEST (Sim, Oh, and Kim 2021) in Table 1, as well as an 8K dataset, SEPE (Al Shoura et al. 2023). |
| Dataset Splits | No | The paper mentions using Vimeo-90K triplet and X-TRAIN for fine-tuning and evaluation, and following an evaluation protocol for X-TEST. However, it does not explicitly state specific percentages, sample counts, or direct references to standard splits with details for reproducibility within the paper's text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify version numbers for any software libraries, programming languages, or other dependencies. |
| Experiment Setup | Yes | We use a mini-batch size of 256 and train the base model for 3 M iteration steps and the patch-based cascade model for 200 k iteration steps. We use the Adam optimizer (Kingma and Ba 2014) with a constant learning rate 1e 4 with initial warmup. For inference, we use 3-stage patch-based cascade setup with a patch size of 512 768, averaging 4 samples estimated via 4 sampling steps. Our data augmentation includes random crop and horizontal, vertical, and temporal flip with a probability of 50%. We use a crop size of 352 480 for large-scale base model training and 224 288 for the cascade model training. We use a multi-resolution crop augmentation that crops an image patch with a random rectangular crop size between the original resolution and the final crop size and then resize it to the final crop size. |