Text-to-LoRA: Instant Transformer Adaption

Authors: Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental After training T2L on a suite of 9 pre-trained Lo RA adapters (GSM8K, Arc, etc.), we show that the ad-hoc reconstructed Lo RA instances match the performance of task-specific adapters across the corresponding test sets. Furthermore, T2L can compress hundreds of Lo RA instances and zeroshot generalize to entirely unseen tasks.
Researcher Affiliation Industry 1Sakana AI. Correspondence to: Rujikorn Charakorn <EMAIL>, Robert T. Lange <EMAIL>.
Pseudocode No The paper only describes steps in regular paragraph text without structured formatting, and does not contain any figure, block, or section explicitly labeled "Pseudocode" or "Algorithm".
Open Source Code Yes Our code is available at https://github.com/SakanaAI/text-to-lora.
Open Datasets Yes We utilize the SNI dataset (Wang et al., 2022) for training Lo RA adapters. We use 500 SNI datasets publicly available at https://huggingface.co/Lots-of-LoRAs. For evaluation, we choose 10 widely used benchmarks that collectively cover a variety of LLM capability assessments, e.g., reasoning, math, science, coding, and world knowledge. Specifically, we include the following benchmarks: Arc-challenge (Arc C) and Arc-easy (Arc E) (Clark et al., 2018), Bool Q (Clark et al., 2019), GSM8K (Cobbe et al., 2021), Hellaswag (HS) (Zellers et al., 2019), Open Book QA (OQA) (Mihaylov et al., 2018), PIQA (Bisk et al., 2020), Winogrande (WG) (Keisuke et al., 2019), Human Eval (HE) (Chen et al., 2021), and MBPP (Austin et al., 2021).
Dataset Splits Yes We use 11 tasks for hold-out validation and removed 10 datasets due to data contamination from the evaluation benchmark tasks, leaving 479 datasets for training. All samples are in English. More details of the datasets can be found in Appendix J. For evaluation, we choose 10 widely used benchmarks... We evaluate the models on the test split, using chain-of-thought response pre-filling: Let’s think step by step. (J.1.1 GSM8K) Human Eval only has the test split, therefore it is always evaluated against in the zero-shot manner. (J.1.2 HUMANEVAL AND MBPP)
Hardware Specification Yes All models trained in this work fit in a single H100 GPU (80GB of VRAM).
Software Dependencies No The paper mentions software like Mistral-7B-Instruct (Jiang et al., 2023) as the base LLM model and gte-large-en-v1.5 (Li et al., 2023; Zhang et al., 2024) for task embedding, implying use of libraries like PyTorch (from torch.cuda.FloatTensor). However, it does not provide specific version numbers for these or other key software components (e.g., Python version, specific library versions).
Experiment Setup Yes Table 11: Hyperparameters for training a task-specific Lo RA adapter. Hyperparameters Task-specific Lo RA T2L (SFT) T2L (recon) Batch size 8 8 Number of the target Lo RAs Gradient accumulation steps 1 1 1 Max learning rate 8 10-5 2.5 10-5 10-3 Max gradient norm 1.0 1.0 1.0 NEFTune noise alpha 5.0 5.0 5.0 Warmup fraction 0.1 0.1 0.1 Learning rate scheduler Linear with warm up Linear with warm up Linear with warm up