reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Survey of Video Diffusion Models: Foundations, Implementations, and Applications

Authors: Yimu Wang, Xuye Liu, Wei Pang, Li Ma, Shuai Yuan, Paul Debevec, Ning Yu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This survey provides a comprehensive review of diffusion-based video generation, examining its evolution, technical foundations, and practical applications. We present a systematic taxonomy of current methodologies, analyze architectural innovations and optimization strategies, and investigate applications across low-level vision tasks such as denoising and super-resolution. This survey serves as a foundational resource for researchers and practitioners working at the intersection of diffusion models and video generation, providing insights into both the theoretical frameworks and practical implementations that drive this rapidly evolving field.
Researcher Affiliation	Collaboration	Yimu Wang EMAIL University of Waterloo Xuye Liu EMAIL University of Waterloo Wei Pang EMAIL University of Waterloo Li Ma EMAIL Netflix Eyeline Studios Shuai Yuan EMAIL Duke University Paul Debevec EMAIL Netflix Eyeline Studios Ning Yu EMAIL Netflix Eyeline Studios
Pseudocode	Yes	Algorithm 1 Classifier-guided DDPM sampling, given a diffusion model (µθ(xt), Σθ(xt)), classifier pϕ(y\|xt), and gradient scale s. Algorithm 2 Classifier-guided DDIM sampling, given a diffusion model ϵθ(xt), classifier pϕ(y\|xt), and gradient scale s. Algorithm 3 Joint training a diffusion model with classifier-free guidance Algorithm 4 Conditional sampling with classifier-free guidance
Open Source Code	Yes	A structured list of related works involved in this survey is also available on Git Hub: https://github.com/Eyeline Labs/Survey-Video-Diffusion.
Open Datasets	Yes	Table 2: The overview of most popular datasets used in training video generation models. We also include image datasets as they are usually used in training. I , V , T , and A represent image, video, text, and audio. Other commercial datasets include those released by Pond5, Adobe Stock, Shutterstock, Getty, Coverr, Videvo, Depositphotos, Storyblocks, Dissolve, Freepik, Vimeo, and Envato. ... UCF-101 (Soomro et al., 2012)
Dataset Splits	No	The paper is a survey and does not present its own experimental results requiring dataset splits. While it mentions various datasets used in other works, it does not provide specific split information for reproducibility of experiments conducted within this paper.
Hardware Specification	Yes	Table 1: Comparison of modules and parameters in different diffusion generative models and their industry applications. ... Cog Video(Hong et al., 2023a) ... 8 RTX 6000 ... Magic Video(Zhou et al., 2022) ... 1 A100 ... Open-Sora(Zheng et al., 2024c) ... 8 H100s
Software Dependencies	No	The paper mentions several software components, tools, and models like Flash Attention, ZeRO, Qwen2-VL, CLIP, and GPT-4 Vision. However, it does not provide specific version numbers for these or any other key software dependencies required for replication.
Experiment Setup	No	The paper is a survey of video diffusion models and reviews various methodologies and implementations. While it discusses training engineering techniques such as 'multi-resolution frame pack strategy' and 'progressive training strategy', it does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings for any model's training setup within the main text.