IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
Authors: Hang Guo, Yawei Li, Tao Dai, Shu-Tao Xia, Luca Benini
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show our Int Lo RA achieves significant speedup on both training and inference without losing performance. Code is available at https://github.com/csguoh/Int Lo RA. We evaluate our Int Lo RA on various diffusion personalization tasks. Extensive experiments show that Int Lo RA presents impressive efficiency and performance. |
| Researcher Affiliation | Academia | 1Tsinghua University 2ETH Z urich 3Shenzhen University 4Peng Cheng Laboratory. *Project Lead. Correspondence to: Yawei Li <EMAIL>, Tao Dai <EMAIL>. The institutions listed are Tsinghua University, ETH Zurich, Shenzhen University, and Peng Cheng Laboratory, all of which are academic or public research institutions. |
| Pseudocode | Yes | A. Summery of Int Lo RA Algorithm Before tuning, we pre-process the weights in Algo. 1, followed by the forward process of Int Lo RAMUL and Int Lo RASHIFT in Algo. 2 and Algo. 3, respectively. Algorithm 1 The weight pre-process of the linear layer in Int Lo RA Algorithm 2 The forward process of the linear layer in Int Lo RAMUL Algorithm 3 The forward process of the linear layer in Int Lo RASHIFT |
| Open Source Code | Yes | Code is available at https://github.com/csguoh/Int Lo RA. |
| Open Datasets | Yes | We evaluate on multiple adaptation tasks, including subject-driven generation (Ruiz et al., 2023), controllable generation (Zhang et al., 2023), and style-customized image generation (Sohn et al., 2023). For the subject-driven generation, we use a subset which contains 15 text-subject pairs from the Dreambooth (Ruiz et al., 2023) dataset...For controllable generation...on ADE20K dataset (Zhou et al., 2017), Landmark to Face (L2F) on Celeb A-HQ dataset (Karras, 2017), and the Canny edge to Image (C2I) on the COCO dataset (Lin et al., 2014). For the style-customized generation, we employ the Style Drop (Sohn et al., 2023) dataset...Specifically, we fine-tune the Llama3-8B model (Dubey et al., 2024) and use the Meta Math QA dataset (Yu et al., 2023) for training and GSM8K dataset (Cobbe et al., 2021) for testing. |
| Dataset Splits | Yes | For the subject-driven generation, we use a subset which contains 15 text-subject pairs from the Dreambooth (Ruiz et al., 2023) dataset, for fast training and evaluation...For controllable generation, we fine-tune the model for 11 epochs for Canny-to-Image tasks and 20 epochs for Landmark-to-Face and Segmentation-to-Image tasks...For the style-customized generation, we employ the Style Drop (Sohn et al., 2023) dataset, which includes 18 style images, and we use 6 text prompts for each style...we fine-tune the Llama3-8B model (Dubey et al., 2024) and use the Meta Math QA dataset (Yu et al., 2023) for training and GSM8K dataset (Cobbe et al., 2021) for testing. |
| Hardware Specification | Yes | The training speed is tested on one NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using the "Adam W optimizer" but does not specify its version number or any other software dependencies with version details. |
| Experiment Setup | Yes | Implementation details. We employ the Stable Diffusion V1.5 (Rombach et al., 2022) as the pre-trained backbone...For the subject-driven generation, we use the Adam W optimizer with a weight decay of 1e-2...The learning rate is set to 6e-5. The batch size is set to 1, and the number of training steps is 400. The rank of the Lo RA is set to 4...For the controllable generation, we fine-tune the model for 11 epochs for Canny-to-Image tasks and 20 epochs for Landmark-to-Face and Segmentation-to-Image tasks. The learning rate is set to 1e-5 using the Adam W optimizer. The Lo RA rank is set to 4. The batch size is set to 8 and the image resolution is 512 512...For the style-customized generation, we fine-tune the pre-trained model using the Adam W optimizer with a learning rate of 5e-5...We fine-tune for 500 steps with batch size 1. |