reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Authors: Zhengyao Lyu, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K Wong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate Faster Cache on recent video diffusion models. Experimental results show that Faster Cache can significantly accelerate video generation (e.g., 1.67 speedup on Vchitect-2.0) while keeping video quality comparable to the baseline, and consistently outperform existing methods in both inference speed and video quality. [...] To assess the performance of video synthesis acceleration methods, we focus primarily on two aspects, namely inference efficiency and visual quality. To evaluate inference efficiency, we employ Multiply-Accumulate Operations (MACs) and inference latency as metrics. We utilize VBench (Huang et al., 2024), LPIPS (Zhang et al., 2018), PSNR, and SSIM for visual quality evaluation. [...] We perform extensive ablation studies based on Open-Sora, synthesizing videos of 48 frames at 480P.
Researcher Affiliation	Academia	1The University of Hong Kong 2S-Lab, Nanyang Technological University 3Shanghai Artificial Intelligence Laboratory
Pseudocode	No	The paper describes the methodology and procedures in textual and mathematical forms, including equations (Eq. 1-11), but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://github.com/Vchitect/Faster Cache
Open Datasets	Yes	We apply our acceleration technique to different video synthesis diffusion models, including the Open-Sora 1.2 (Zheng et al., 2024), Open-Sora-Plan (PKU-Yuan Lab and Tuzhan AI etc., 2024), Latte (Ma et al., 2024a), Cog Video X (Yang et al., 2024), and Vchitect-2.0 (Fan et al., 2025). [...] We utilize VBench (Huang et al., 2024), LPIPS (Zhang et al., 2018), PSNR, and SSIM for visual quality evaluation. VBench is a comprehensive benchmark suit for video generative models.
Dataset Splits	No	The paper does not provide specific details about dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing within its own experimental setup. It applies its method to existing models and uses the VBench for evaluation, which is a benchmark suite rather than a dataset with explicit splits for this paper's experiments.
Hardware Specification	Yes	(Lat denotes latency, measured on a single A100 GPU.) [...] All experiments are carried out on NVIDIA A100 80GB GPUs using PyTorch, with Flash Attention (Dao et al., 2022) enabled by default.
Software Dependencies	No	All experiments are carried out on NVIDIA A100 80GB GPUs using PyTorch, with Flash Attention (Dao et al., 2022) enabled by default. While PyTorch and Flash Attention are mentioned, specific version numbers for these software components are not provided, which is necessary for reproducibility.
Experiment Setup	Yes	All experiments conduct full attention inference for spatial and temporal attention modules every 2 timesteps to facilitate dynamic feature reuse. The weight w(t) increases linearly from 0 to 1 starting from the beginning of dynamic feature reuse until the end of sampling. For CFG output reuse, full inference is conducted every 5 timesteps, starting from 1/3 of the total sampling steps (e.g., for Open-Sora 1.2, which has 30 total sampling steps, this begins at step 10). The hyperparameters α1 and α2 are set to a default value of 0.2, which performs well for most models.