FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Authors: Zhengyao Lyu, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K Wong
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate Faster Cache on recent video diffusion models. Experimental results show that Faster Cache can significantly accelerate video generation (e.g., 1.67 speedup on Vchitect-2.0) while keeping video quality comparable to the baseline, and consistently outperform existing methods in both inference speed and video quality. [...] To assess the performance of video synthesis acceleration methods, we focus primarily on two aspects, namely inference efficiency and visual quality. To evaluate inference efficiency, we employ Multiply-Accumulate Operations (MACs) and inference latency as metrics. We utilize VBench (Huang et al., 2024), LPIPS (Zhang et al., 2018), PSNR, and SSIM for visual quality evaluation. [...] We perform extensive ablation studies based on Open-Sora, synthesizing videos of 48 frames at 480P. |
| Researcher Affiliation | Academia | 1The University of Hong Kong 2S-Lab, Nanyang Technological University 3Shanghai Artificial Intelligence Laboratory |
| Pseudocode | No | The paper describes the methodology and procedures in textual and mathematical forms, including equations (Eq. 1-11), but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/Vchitect/Faster Cache |
| Open Datasets | Yes | We apply our acceleration technique to different video synthesis diffusion models, including the Open-Sora 1.2 (Zheng et al., 2024), Open-Sora-Plan (PKU-Yuan Lab and Tuzhan AI etc., 2024), Latte (Ma et al., 2024a), Cog Video X (Yang et al., 2024), and Vchitect-2.0 (Fan et al., 2025). [...] We utilize VBench (Huang et al., 2024), LPIPS (Zhang et al., 2018), PSNR, and SSIM for visual quality evaluation. VBench is a comprehensive benchmark suit for video generative models. |
| Dataset Splits | No | The paper does not provide specific details about dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing within its own experimental setup. It applies its method to existing models and uses the VBench for evaluation, which is a benchmark suite rather than a dataset with explicit splits for this paper's experiments. |
| Hardware Specification | Yes | (Lat denotes latency, measured on a single A100 GPU.) [...] All experiments are carried out on NVIDIA A100 80GB GPUs using PyTorch, with Flash Attention (Dao et al., 2022) enabled by default. |
| Software Dependencies | No | All experiments are carried out on NVIDIA A100 80GB GPUs using PyTorch, with Flash Attention (Dao et al., 2022) enabled by default. While PyTorch and Flash Attention are mentioned, specific version numbers for these software components are not provided, which is necessary for reproducibility. |
| Experiment Setup | Yes | All experiments conduct full attention inference for spatial and temporal attention modules every 2 timesteps to facilitate dynamic feature reuse. The weight w(t) increases linearly from 0 to 1 starting from the beginning of dynamic feature reuse until the end of sampling. For CFG output reuse, full inference is conducted every 5 timesteps, starting from 1/3 of the total sampling steps (e.g., for Open-Sora 1.2, which has 30 total sampling steps, this begins at step 10). The hyperparameters α1 and α2 are set to a default value of 0.2, which performs well for most models. |