Cached Multi-Lora Composition for Multi-Concept Image Generation

Authors: Xiandong Zou, Mingzhu Shen, Christos-Savvas Bouganis, Yiren Zhao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluations demonstrate that CMLo RA outperforms state-of-the-art training-free Lo RA fusion methods by a significant margin it achieves an average improvement of 2.19% in CLIPScore, and 11.25% in MLLM win rate compared to Lora Hub, Lo RA Composite, and Lo RA Switch. The paper also includes dedicated sections such as "3 EXPERIMENTS", "3.1 EXPERIMENTAL SETUP", "3.2 RESULTS", and numerous performance tables and figures.
Researcher Affiliation Academia Xiandong Zou, Mingzhu Shen , Christos-Savvas Bouganis, Yiren Zhao Imperial College London, UK EMAIL
Pseudocode No The paper describes methods using equations and figures, such as Figure 4 for the framework overview, but does not include a dedicated pseudocode or algorithm block with structured steps formatted like code.
Open Source Code Yes 1The source code is released at https://github.com/Yqcca/CMLo RA.
Open Datasets Yes Based on the testbed Compos Lo RA (Zhong et al., 2024), we curate two unique subsets of Lo RAs representing realistic and anime styles.
Dataset Splits No The paper uses the 'Compos Lo RA' testbed and mentions curating subsets of LoRAs for evaluation. However, it does not explicitly provide specific training, validation, or test splits for image datasets, nor does it detail how images were partitioned for evaluation purposes in the traditional sense of machine learning dataset splits.
Hardware Specification Yes The experiments were run with a mix of NVIDIA A100 GPUs with 40GB memory and NVIDIA V100 GPUs with 16GB memory.
Software Dependencies No The paper mentions using 'Diffusers (von Platen et al., 2022)', 'stable-diffusion-v1.5 implemented by Rombach et al. (2022) in Py Torch (Paszke et al., 2019)', and 'DPM-Solver++ proposed by Lu et al. (2022)'. While these are software components, specific version numbers for the software libraries (e.g., 'PyTorch 1.9') are not provided, only the publication years of their corresponding papers.
Experiment Setup Yes For the anime style subset, the settings differ slightly with 200 denoising steps, a guidance scale s of 10, and an image size of 512 512. The DPM-Solver++ proposed by Lu et al. (2022) is used as the scheduler in the generation process. The Lo RA scale for all Lo RAs is set to 1.4, which is applied within the crossattention module of the U-Net. The dominant weight scale wdom is initially set at N 0.5, where N is the total number of activated Lo RAs. Then this weight scale is adjusted using a decaying method. For the i-th turn of switching the dominant Lo RA, the weight is defined as: wi dom = wi 1 dom 0.5i. In addition, we select c1 = 2 and c2 = 3 for the caching strategy applied to non-dominant Lo RAs. The hyper-parameters are selected using grid search methods described in Appendix F.