LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation

Authors: Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate the effectiveness of Lo RA-X for text-to-image generation, including Stable Diffusion v1.5 and Stable Diffusion XL. 5 Experiment This section describes our experiments evaluating the effectiveness of Lo RA-X, detailing the setup and presenting our evaluation. We analyze and quantify Lo RA-X through textto-image generation experiments in Section 5.1 and text-generation experiments in Appendix E.3.
Researcher Affiliation Industry Farzad Farhadzadeh , Debasmit Das, Shubhankar Borse, Fatih Porikli Qualcomm AI Research , San Diego, CA 92121, USA EMAIL
Pseudocode Yes Algorithm 1 Simplistic Pytorch style pseudocode for Lo RA-X transfer
Open Source Code No The implementation is derived from the codebase2, where the hyper-parameters are described as above. 2https://shorturl.at/x56s8. We also share the implementation of our Lo RA-X transfer technique. It takes the Lo RA-X from the source model, the target model and filter blocks. The filter blocks are modules where we do not apply the transfer due to low subspace similarity between the source and target model. The output is the Lo RA-X for the target model. (followed by Algorithm 1, which is pseudocode). The provided short URL resolves to https://github.com/huggingface/peft, which is a general library, not the specific implementation of LoRA-X from this paper.
Open Datasets Yes Datasets: For style transfer, we evaluate Lo RA-X trained from scratch on base source models and training-free transferred Lo RA-X using public datasets like Blue Fire, Origami Styles, and Paintings. We follow the setup described in (Borse et al., 2024). Additional details are in Appendix B. Appendix B: Blue Fire (Training): The Blue Fire dataset is created by collecting images from open public domain... Paintings: On similar lines, the Paintings dataset is also a collection of images from public domain (CC0 license)... Origami: The Origami dataset is also a collection of origami images from public domains... Pokemon: The Pokemon dataset is a collect of Pokemon images, initially introduced in (Liu et al., 2020).
Dataset Splits No Blue Fire (Training): The Blue Fire dataset is created by collecting images from open public domain and consist of 6 concepts car, dragon, bird, fox, man and castle. The dataset has a total of 54 images covering all the concepts. Blue Fire (Validation): The Bluefire validation set consists of 30 curated text prompts, of which 9 prompts contain one of 6 categories on which the model was trained, and the remaining 21 prompts correspond to categories which the low-rank adapter has not been fine-tuned on. These contain categories such as: (football, monster, sword, chess rook, lion, tiger, dog, cat, koala, panda). For all training experiments validating on this dataset, we produce 30 images per prompt, varying the input seed. Hence, the HPS analysis is over 900 image and LPIPS-diversity analysis is over 14500 image pairs. The paper describes how validation prompts are used to generate samples for evaluation, but does not specify explicit training/testing/validation splits for the underlying datasets themselves.
Hardware Specification Yes Wall clock time is measured on A100 GPU
Software Dependencies No The implementation is derived from the codebase2, where the hyper-parameters are described as above. 2https://shorturl.at/x56s8. Table 8: Ablation on different hyper parameters on training Lo RA-X using base model SDv1.5 and Blue Fire dataset. ... kohya-ss repository3 for Lo RA finetuing. 3https://github.com/kohya-ss/sd-scripts. The paper mentions existing codebases (HuggingFace PEFT and Kohya-ss repository) but does not provide specific version numbers for these software dependencies or other key libraries.
Experiment Setup Yes For the adaptation, we use mixed precision training of FP16, a batch size of 8, gradient accumulation steps of 1, use gradient checkpointing and learning rate of 1e-3 with a constant scheduler. We use SNR Gamma = 5.0 and train for 5000 iterations. For training Lo RA-X on SD Eff-v1.0 and Real Vis-V3.0, as well as for all the datasets: Blue Fire, Paintings and Origami, we follow the same set of hyper-parameters. For the adaptation, we use mixed precision training of FP16, a batch size of 8, gradient accumulation steps of 1 and learning rate of 1e-4 with a cosine scheduler. We train for 5000 steps. Table 8 presents the ablation study on hyperparameters, including Steps and Batch size, which differ from the default values in the kohya-ss repository3 for Lo RA finetuing.