SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
Authors: Teng Hu, Jiangning Zhang, Ran Yi, Hongrui Huang, Yabiao Wang, Lizhuang Ma
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of our method, we conduct experiments on various tasks, including backbone fine-tuning, downstream dataset fine-tuning, image customization, and controllable video generation (appendix). We compare our method with three state-of-the-art parameter efficient fine-tunining methods: Lo RA (Hu et al., 2021), Adaptformer (Chen et al., 2022), and LT-SFT (Ansell et al., 2021); along with the full-parameter fine-tuning method. We evaluate the generation models by three metrics: 1) Fr echet Inception Distance (FID) (Heusel et al., 2017), 2) CLIP Score, and 3) Visual-Linguistic Harmony Index (VLHI)... The results are shown in Fig. 6, which demonstrates that our method achieves the best FID scores, indicating our method effectively improves the performance of the pre-trained models on the main task. |
| Researcher Affiliation | Collaboration | Teng Hu1 , Jiangning Zhang2 , Ran Yi1 , Hongrui Huang1, Yabiao Wang2, Lizhuang Ma1 1 Shanghai Jiao Tong University 2 Youtu Lab, Tencent |
| Pseudocode | No | The paper describes the proposed methods (Sa RA, nuclear norm-based low-rank constraint, progressive parameter adjustment, unstructural backpropagation) using textual descriptions and mathematical formulas, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://sjtuplayer.github.io/projects/Sa RA. |
| Open Datasets | Yes | Specifically, we employ the pre-trained Stable Diffusion models on Image Net (Deng et al., 2009), FFHQ (Karras et al., 2019), and Celeb A-HQ (Karras et al., 2017) datasets, and fine-tune them on these pre-trained datasets for 10K iterations. |
| Dataset Splits | No | The paper mentions fine-tuning on pre-trained datasets for 10K iterations and computing FID between 5K generated data and 5K randomly sampled data from the source dataset, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for the experimental setup. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'mainstream deep learning libraries (such as Py Torch and Tensor Flow)' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Specifically, we employ the pre-trained Stable Diffusion models on Image Net (Deng et al., 2009), FFHQ (Karras et al., 2019), and Celeb A-HQ (Karras et al., 2017) datasets, and fine-tune them on these pre-trained datasets for 10K iterations. |