Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
Authors: Tianyi Zhang, Junda Su, Aditya Desai, Oscar Wu, Zhaozhuo Xu, Anshumali Shrivastava
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluations with Llama and Mistral models demonstrate that Sketch Tune outperforms leading PEFT methods across diverse tasks while using substantially smaller base models and comparable trainable parameters. As a highlight, Sketch Tune outperforms Lo RA, Do RA, and S2FT on commonsense and math benchmarks using 2.63.5 smaller base models and exceeds Loft Q in accuracy by 14.48% on GSM8K with 7.3 fewer trainable parameters. |
| Researcher Affiliation | Collaboration | 1Rice University, Houston, TX 2x MAD.ai 3University of California, Berkeley, Berkeley, CA 4Stevens Institute of Technology, Hoboken, NJ 5Third AI Corp. 6Ken Kennedy Institute. |
| Pseudocode | Yes | Algorithm 1 Learning to Sketch LLM Weights |
| Open Source Code | Yes | Our code and model checkpoints are available publicly1. 1https://github.com/Lean Models/Sketch Tune |
| Open Datasets | Yes | For math problem-solving, we fine-tune these models on the Math10K dataset and evaluate on 7 different math reasoning datasets (Hu et al., 2023). For commonsense reasoning, we fine-tune on the Commonsense170K dataset and evaluate on 8 different commonsense reasoning datasets (Hu et al., 2023). To compare Sketch Tune against efficient quantized model fine-tuning methods, we follow the settings in Li et al. (2023b) to fine-tune and test Llama-2 models on the language modeling dataset Wiki Text-2 (Merity et al., 2022) and the math reasoning dataset GSM8K (Cobbe et al., 2021). |
| Dataset Splits | Yes | The Wiki Text-2 dataset (Merity et al., 2016) consists of 44.8k training data, consisting of 36.7K training data, 3.76K validatiaon data, and 4.36K test data. Following Loft Q (Li et al., 2023b), we used the training set to perform fine-tuning, and the validataion set to evaluate fine-tuned model s performance. |
| Hardware Specification | Yes | We sketch each model using a single Quadro RTX 8000-48GB GPU. For model training, we train each model using a single NVIDIA A100-40GB GPU. All experiments are performed on an NVIDIA A100-40GB GPU. |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al., 2019)" and "Transformers library (Wolf et al., 2020)" but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We optimize Sketch Tune s hyper-parameters, including learning rate and batch size, through a parameter sweep, and we report the hyper-parameters for training in Appendix I. Appendix I contains tables with hyperparameter selections for fine-tuning Sketch Tune on various tasks, including LR, Optimizer, Batch Size, Epochs, LR Scheduler, and Warmup Steps. |