reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Authors: Yumeng Li, William H Beluch, Margret Keuper, Dan Zhang, Anna Khoreva

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally showcase the superiority of our method in synthesizing longer, visually appealing videos over open-sourced T2V models.
Researcher Affiliation	Collaboration	1Amazon 2Bosch Center for Artificial Intelligence 3University of Mannheim 4Max Planck Institute for Informatics 5Zalando
Pseudocode	No	The paper describes methods and equations for Temporal Attention Regularization and Video Synopsis Prompting but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	We plan to release the code upon acceptance.
Open Datasets	Yes	Experimental setting. To demonstrate the effectiveness of VSTAR in creating more dynamic videos, we run experiments and ablations on Chrono Magic-Bench-150 (Yuan et al., 2024) and prompts generated by Chat GPT (Open AI, 2022) describing various visual transitions. [...] For our analysis, we use Video Crafter2 (Chen et al., 2024a) along with videos from the DAVIS dataset (Perazzi et al., 2016) and additional videos collected from the web.
Dataset Splits	No	The paper mentions using Chrono Magic-Bench-150 and DAVIS dataset but does not specify any particular training, testing, or validation splits used for its experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not specify version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	By default, we employ the state-of-the-art open-sourced T2V model Video Crafter2 (Chen et al., 2024a) with 320 × 512 resolution as our base model, which is combined with the proposed video synopsis prompting (VSP) and temporal attention regularization (TAR).