T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Authors: Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, anima anandkumar
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On Di T-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster Di T-S without performance drop on class-conditional Image Net generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. |
| Researcher Affiliation | Collaboration | 1Monash University 2NVIDIA 3University of Wisconsin, Madison 4Caltech |
| Pseudocode | No | The paper describes methods conceptually and with mathematical equations, but does not include any explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper mentions using a 'public model zoo on Diffusers (von Platen et al., 2022)' and 'Hugging Face.co and Civitai.com', which are third-party resources or communities. It does not provide any explicit statement from the authors about releasing their own code for the methodology described in this paper. |
| Open Datasets | Yes | Following Di T, we conduct the class-conditional Image Net experiments based on pretrained Di T-S/B/XL under 256 256 images and patch size of 2. |
| Dataset Splits | Yes | We use the reference batch from ADM (Dhariwal & Nichol, 2021) and sample 5,000 images to compute FID. |
| Hardware Specification | Yes | For example, even with a high-performance RTX 3090, generating 8 images with Di T-XL (Peebles & Xie, 2022) takes 16.5 seconds with 100 denoising steps, which is 10 slower than its smaller counterpart Di T-S (1.7s) with a lower generation quality. |
| Software Dependencies | No | The paper mentions 'Diffusers' implicitly through a citation, but no specific version numbers are provided for any software libraries or dependencies used for the implementation of T-Stitch. |
| Experiment Setup | Yes | By default, we adopt a classifier-free guidance scale of 1.5 as it achieves the best FID for Di T-XL, which is also the target model in our setting. |