IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

Authors: Hang Guo, Yawei Li, Tao Dai, Shu-Tao Xia, Luca Benini

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show our Int Lo RA achieves significant speedup on both training and inference without losing performance. Code is available at https://github.com/csguoh/Int Lo RA. We evaluate our Int Lo RA on various diffusion personalization tasks. Extensive experiments show that Int Lo RA presents impressive efficiency and performance.
Researcher Affiliation Academia 1Tsinghua University 2ETH Z urich 3Shenzhen University 4Peng Cheng Laboratory. *Project Lead. Correspondence to: Yawei Li <EMAIL>, Tao Dai <EMAIL>. The institutions listed are Tsinghua University, ETH Zurich, Shenzhen University, and Peng Cheng Laboratory, all of which are academic or public research institutions.
Pseudocode Yes A. Summery of Int Lo RA Algorithm Before tuning, we pre-process the weights in Algo. 1, followed by the forward process of Int Lo RAMUL and Int Lo RASHIFT in Algo. 2 and Algo. 3, respectively. Algorithm 1 The weight pre-process of the linear layer in Int Lo RA Algorithm 2 The forward process of the linear layer in Int Lo RAMUL Algorithm 3 The forward process of the linear layer in Int Lo RASHIFT
Open Source Code Yes Code is available at https://github.com/csguoh/Int Lo RA.
Open Datasets Yes We evaluate on multiple adaptation tasks, including subject-driven generation (Ruiz et al., 2023), controllable generation (Zhang et al., 2023), and style-customized image generation (Sohn et al., 2023). For the subject-driven generation, we use a subset which contains 15 text-subject pairs from the Dreambooth (Ruiz et al., 2023) dataset...For controllable generation...on ADE20K dataset (Zhou et al., 2017), Landmark to Face (L2F) on Celeb A-HQ dataset (Karras, 2017), and the Canny edge to Image (C2I) on the COCO dataset (Lin et al., 2014). For the style-customized generation, we employ the Style Drop (Sohn et al., 2023) dataset...Specifically, we fine-tune the Llama3-8B model (Dubey et al., 2024) and use the Meta Math QA dataset (Yu et al., 2023) for training and GSM8K dataset (Cobbe et al., 2021) for testing.
Dataset Splits Yes For the subject-driven generation, we use a subset which contains 15 text-subject pairs from the Dreambooth (Ruiz et al., 2023) dataset, for fast training and evaluation...For controllable generation, we fine-tune the model for 11 epochs for Canny-to-Image tasks and 20 epochs for Landmark-to-Face and Segmentation-to-Image tasks...For the style-customized generation, we employ the Style Drop (Sohn et al., 2023) dataset, which includes 18 style images, and we use 6 text prompts for each style...we fine-tune the Llama3-8B model (Dubey et al., 2024) and use the Meta Math QA dataset (Yu et al., 2023) for training and GSM8K dataset (Cobbe et al., 2021) for testing.
Hardware Specification Yes The training speed is tested on one NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using the "Adam W optimizer" but does not specify its version number or any other software dependencies with version details.
Experiment Setup Yes Implementation details. We employ the Stable Diffusion V1.5 (Rombach et al., 2022) as the pre-trained backbone...For the subject-driven generation, we use the Adam W optimizer with a weight decay of 1e-2...The learning rate is set to 6e-5. The batch size is set to 1, and the number of training steps is 400. The rank of the Lo RA is set to 4...For the controllable generation, we fine-tune the model for 11 epochs for Canny-to-Image tasks and 20 epochs for Landmark-to-Face and Segmentation-to-Image tasks. The learning rate is set to 1e-5 using the Adam W optimizer. The Lo RA rank is set to 4. The batch size is set to 8 and the image resolution is 512 512...For the style-customized generation, we fine-tune the pre-trained model using the Adam W optimizer with a learning rate of 5e-5...We fine-tune for 500 steps with batch size 1.