reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Authors: Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Wangmeng Zuo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted experiments in extensive prompts under the combination of various T2V and T2I. The results show that Video Elevator not only improves the performance of T2V baselines with foundational T2I, but also facilitates stylistic video synthesis with personalized T2I. [...] 5 Experiments 5.1 Experimental settings 5.2 Comparisons with T2V baselines 5.3 Ablation studies
Researcher Affiliation	Collaboration	1 Harbin Institute of Technology 2 Tongyi Lab EMAIL, zheng EMAIL, EMAIL
Pseudocode	No	The paper describes methods using text and mathematical equations (e.g., Eqn. 1 to Eqn. 11), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/YBYBZhang/Video Elevator Project https://videoelevator.github.io/
Open Datasets	Yes	We evaluate Video Elevator and other baselines in two benchmarks: (i) VBench (Huang et al. 2023b) dataset that involves in a variety of content categories and contains 800 prompts; (ii) Video Creation dataset, which unifies creative prompts datasets of Make-A-Video (Singer et al. 2023) and Video LDM (Blattmann et al. 2023b) and consists of 100 prompts in total.
Dataset Splits	No	The paper mentions using 'VBENCH' and 'Video Creation dataset' and specifies their total number of prompts (800 and 100, respectively), but does not provide specific training, validation, or test splits or their proportions for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions various models and frameworks like 'Stable Diffusion V1.5 or V2.1-base', 'Animate Diff', 'Zero Scope', 'La Vie', 'T2I', 'T2V', 'LDM', 'U-Net', and evaluation metrics like 'CLIP score', 'CLIP-IQA', 'LAION aesthetic predictor'. However, it does not provide specific version numbers for any programming languages, libraries, or other software dependencies used for implementation.
Experiment Setup	Yes	Notably, when N is very small (e.g, N = 1), the synthesized video only contains coarse-grained motion, so we set N to 8 10 to add fine-grained one (refer to Appendix B). [...] Empirically, applying temporal motion refining in just a few timesteps (i.e, 4 5 steps) can ensure temporal consistency (refer to Appendix B).