reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Real-Time Video Generation with Pyramid Attention Broadcast

Authors: Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for Di T-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different broadcast strategies to each attention based on their variance for best efficiency. We further introduce broadcast sequence parallel for more efficient distributed inference. PAB demonstrates up to 10.5 speedup across three models compared to baselines, achieving real-time generation for up to 720p videos. Section 3 is dedicated to 'EXPERIMENTS' where models, metrics, baselines, and implementation details are discussed, and results are presented in tables and figures.
Researcher Affiliation	Academia	Xuanlei Zhao1 , Xiaolong Jin2 , Kai Wang1 , Yang You1 1National University of Singapore 2Purdue University Code: NUS-HPC-AI-Lab/Video Sys EMAIL EMAIL. All authors are affiliated with universities.
Pseudocode	No	The paper describes the proposed method, Pyramid Attention Broadcast (PAB), and its components, including broadcast sequence parallel, through textual descriptions, figures (e.g., Figure 5: Overview of Pyramid Attention Broadcast), and mathematical formulations (Equations 1 and 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code: NUS-HPC-AI-Lab/Video Sys
Open Datasets	Yes	We generate videos based on VBench s (Huang et al., 2024) prompts. To further evaluate the efficacy of our method, we expand our analysis using a subset of 1000 videos from Web Vid (Bain et al., 2021), a large-scale text-video dataset sourced from stock footage websites.
Dataset Splits	Yes	We generate videos based on VBench s (Huang et al., 2024) prompts. To further evaluate the efficacy of our method, we expand our analysis using a subset of 1000 videos from Web Vid (Bain et al., 2021), a large-scale text-video dataset sourced from stock footage websites.
Hardware Specification	Yes	All experiments are carried out on the NVIDIA H100 80GB GPUs with Pytorch. Latency is measured on 8 H100 GPUs. We evaluate the latency and speedup achieved by PAB246/PAB235 (the strategy with best quality, but less speedup) for single video generation across up to 8 NVIDIA H100 GPUs.
Software Dependencies	No	The paper mentions PyTorch and Flash Attention but does not provide specific version numbers for these software components. It states: 'All experiments are carried out on the NVIDIA H100 80GB GPUs with Pytorch.' and 'We enable Flash Attention (Dao et al., 2022) by default for all experiments.'
Experiment Setup	Yes	Table 5: The inference config of three models. model scheduler inference steps Open-Sora RFLOW 30 Open-Sora-Plan PSNR 150 Latte DDIM 50. In Section A.2 PAB Generation Settings, Table 6 details the attention broadcast configuration including diffusion timesteps and broadcast ranges (e.g., PAB246 means spatial 2, temporal 4, cross 6 with specific diffusion timesteps). Table 7 provides the MLP broadcast configuration including diffusion timesteps, block indices, and broadcast ranges.