Infinite-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation
Authors: Qihua Chen, Yue Ma, Hongfa Wang, Junkun Yuan, Wenzhe Zhao, Qi Tian, Hongmei Wang, Shaobo Min, Qifeng Chen, Wei Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This question drives us to evaluate the capability of existing methods in tackling this difficult task. However, we find that they fall short due to limitations in GPU memory. To further explore their potential, we reduce the resolution of the source video through resizing and then resizing it back after outpainting (see details in Section 4). The results are depicted in Fig 1. We observe that both M3DDM (Fan et al. 2023) and MOTIA (Wang et al. 2024a) produce low-quality results, e.g., blurry content and temporal inconsistencies. ... We conduct experiments with respect to the variations of these factors, see Fig 3. ... Quantitative results. We compare methods in both high and low-resolution settings. ... Qualitative results. In Fig. 7, we showcase the qualitative results. ... Ablation Study. We conduct the ablation study by outpainting the source video from 512 512 to 1440 810, as shown in Table 3. |
| Researcher Affiliation | Collaboration | 1Tencent, Shenzhen, China 2The Hong Kong University of Science and Technology, Hong Kong 3University of Science and Technology of China, Hefei 230027, China 4Tsinghua University, Beijing, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods in prose and figures (Fig 5, Fig 6) but does not include a dedicated section or block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code https://github.com/mayuelala/Follow Your Canvas |
| Open Datasets | Yes | Here, we employ a random subset ( 1M video samples) of the public Panda-70M dataset (Chen et al. 2024) for training, improving reproducibility of our work. |
| Dataset Splits | No | Fan et al. 2023 use a private dataset with 5M video samples. Here, we employ a random subset ( 1M video samples) of the public Panda-70M dataset (Chen et al. 2024) for training, improving reproducibility of our work. The paper describes using a 1M video sample subset for training but does not provide specific details on how the dataset was split into training, validation, or testing sets, or the exact percentages/counts for each split needed to reproduce the data partitioning. |
| Hardware Specification | No | allowing us to perform outpainting within each window in parallel on separate GPUs, thereby accelerating the inference. The paper mentions using 'GPUs' and varying the number of GPUs for parallel inference, but does not specify the exact models or specifications of these GPUs (e.g., NVIDIA A100, Tesla V100). |
| Software Dependencies | No | Our implementation and model initialization are based on the popular video generation framework of Animate Diff-V2 (Guo et al. 2024). The paper mentions basing their implementation on Animate Diff-V2 but does not provide specific version numbers for any software dependencies like Python, PyTorch, or CUDA, which are needed for replication. |
| Experiment Setup | No | Due to the limitation of paper length, we leave more details about the training recipe, the design of the anchor and target windows, and the inference pipeline in the appendix and code. The paper explicitly states that details about the training recipe are left to the appendix and code, and thus are not provided in the main text. |