VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Authors: Yumeng Li, William H Beluch, Margret Keuper, Dan Zhang, Anna Khoreva

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally showcase the superiority of our method in synthesizing longer, visually appealing videos over open-sourced T2V models.
Researcher Affiliation Collaboration 1Amazon 2Bosch Center for Artificial Intelligence 3University of Mannheim 4Max Planck Institute for Informatics 5Zalando
Pseudocode No The paper describes methods and equations for Temporal Attention Regularization and Video Synopsis Prompting but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No We plan to release the code upon acceptance.
Open Datasets Yes Experimental setting. To demonstrate the effectiveness of VSTAR in creating more dynamic videos, we run experiments and ablations on Chrono Magic-Bench-150 (Yuan et al., 2024) and prompts generated by Chat GPT (Open AI, 2022) describing various visual transitions. [...] For our analysis, we use Video Crafter2 (Chen et al., 2024a) along with videos from the DAVIS dataset (Perazzi et al., 2016) and additional videos collected from the web.
Dataset Splits No The paper mentions using Chrono Magic-Bench-150 and DAVIS dataset but does not specify any particular training, testing, or validation splits used for its experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not specify version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup Yes By default, we employ the state-of-the-art open-sourced T2V model Video Crafter2 (Chen et al., 2024a) with 320 × 512 resolution as our base model, which is combined with the proposed video synopsis prompting (VSP) and temporal attention regularization (TAR).