SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Authors: Teng Hu, Jiangning Zhang, Ran Yi, Hongrui Huang, Yabiao Wang, Lizhuang Ma

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the effectiveness of our method, we conduct experiments on various tasks, including backbone fine-tuning, downstream dataset fine-tuning, image customization, and controllable video generation (appendix). We compare our method with three state-of-the-art parameter efficient fine-tunining methods: Lo RA (Hu et al., 2021), Adaptformer (Chen et al., 2022), and LT-SFT (Ansell et al., 2021); along with the full-parameter fine-tuning method. We evaluate the generation models by three metrics: 1) Fr echet Inception Distance (FID) (Heusel et al., 2017), 2) CLIP Score, and 3) Visual-Linguistic Harmony Index (VLHI)... The results are shown in Fig. 6, which demonstrates that our method achieves the best FID scores, indicating our method effectively improves the performance of the pre-trained models on the main task.
Researcher Affiliation Collaboration Teng Hu1 , Jiangning Zhang2 , Ran Yi1 , Hongrui Huang1, Yabiao Wang2, Lizhuang Ma1 1 Shanghai Jiao Tong University 2 Youtu Lab, Tencent
Pseudocode No The paper describes the proposed methods (Sa RA, nuclear norm-based low-rank constraint, progressive parameter adjustment, unstructural backpropagation) using textual descriptions and mathematical formulas, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://sjtuplayer.github.io/projects/Sa RA.
Open Datasets Yes Specifically, we employ the pre-trained Stable Diffusion models on Image Net (Deng et al., 2009), FFHQ (Karras et al., 2019), and Celeb A-HQ (Karras et al., 2017) datasets, and fine-tune them on these pre-trained datasets for 10K iterations.
Dataset Splits No The paper mentions fine-tuning on pre-trained datasets for 10K iterations and computing FID between 5K generated data and 5K randomly sampled data from the source dataset, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for the experimental setup.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions the use of 'mainstream deep learning libraries (such as Py Torch and Tensor Flow)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Specifically, we employ the pre-trained Stable Diffusion models on Image Net (Deng et al., 2009), FFHQ (Karras et al., 2019), and Celeb A-HQ (Karras et al., 2017) datasets, and fine-tune them on these pre-trained datasets for 10K iterations.