reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pyramidal Flow Matching for Efficient Video Generative Modeling

Authors: Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong MU, Zhouchen Lin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method supports generating high-quality 5-second (up to 10-second) videos at 768p resolution and 24 FPS within 20.7k A100 GPU training hours. All code and models are open-sourced at https://pyramid-flow.github.io. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	1Peking University, 2Kuaishou Technology, 3Beijing University of Posts and Telecommunications, 4State Key Lab of General AI, School of Intelligence Science and Technology, Peking University, 5Institute for Artificial Intelligence, Peking University, 6Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
Pseudocode	Yes	Algorithm 1 Sampling with Pyramidal Flow Matching
Open Source Code	Yes	All code and models are open-sourced at https://pyramid-flow.github.io.
Open Datasets	Yes	Our model is trained on a mixed corpus of open-source image and video datasets. For images, we utilize a high-aesthetic subset of LAION-5B (Schuhmann et al., 2022), 11M from CC-12M (Changpinyo et al., 2021), 6.9M non-blurred subset of SA-1B (Kirillov et al., 2023), 4.4M from Journey DB (Sun et al., 2023), and 14M publicly available synthetic data. For video data, we incorporate the Web Vid-10M (Bain et al., 2021), Open Vid-1M (Nan et al., 2024), and another 1M high-resolution non-watermark video primarily from the Open-Sora Plan (PKU-Yuan Lab et al., 2024).
Dataset Splits	No	The paper uses well-known benchmark datasets for evaluation (VBenc, Eval Crafter) and external datasets for training (LAION-5B, Web Vid-10M, etc.), but does not specify how these are split into training, validation, and test sets for the experiments conducted in this paper, nor does it define custom splits with percentages or sample counts.
Hardware Specification	Yes	Our model undergoes a three-stage training procedure using 128 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using 'MM-Di T architecture from SD3 Medium', 'sinusoidal position encoding', '1D Rotary Position Embedding (Ro PE)', and 'Adam W' optimizer, but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup	Yes	The detailed training hyper-parameter settings for each optimization stage are reported in Table 4.