Faster Diffusion Through Temporal Attention Decomposition

Authors: Haozhe Liu, Wentian Zhang, Jinheng Xie, Francesco Faccio, Mengmeng Xu, Tao Xiang, Mike Zheng Shou, Juan-Manuel Perez-Rua, Jürgen Schmidhuber

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show when widely applied to various existing textconditional diffusion models, TGATE accelerates these models by 10% 50%. The code of TGATE is available at https://github.com/Haozhe Liu-ST/T-GATE.
Researcher Affiliation Collaboration 1Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST) 2 Show Lab, National University of Singapore (NUS) 3 Swiss AI Lab, IDSIA, USI & SUPSI, Lugano 4 Meta AI
Pseudocode No The paper only describes methods in paragraph text and does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code of TGATE is available at https://github.com/Haozhe Liu-ST/T-GATE.
Open Datasets Yes comprehensive experiments are conducted using the MS-COCO (Lin et al., 2014), MJHQ (Li et al., 2023), Open Sora-Sample (Lab & etc., 2024) and DPG-Bench (Hu et al., 2024) datasets.
Dataset Splits Yes Similar to a previous study (Podell et al., 2023), 10k images from the MS-COCO validation set (Lin et al., 2014) are used to evaluate the zero-shot generation performance. ... The generated images are set at a resolution of 1024 1024, with a total of 10k samples.
Hardware Specification Yes The latency of generating one image is tested on a 1080 Ti commercial card. ... The computational platform is a single V100 GPU card with pytorch 2.2.
Software Dependencies Yes computational platform is a single V100 GPU card with pytorch 2.2.
Experiment Setup Yes The inference configuration, including the number of inference steps and the noise scheduler, follows the default settings for each model. Additionally, the proposed method is compared with other accelerating methods... For Pix Art, parameters are set to m = 15 and k = 3, whereas for SDXL, we utilize m = 10 and k = 5.