Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

Authors: Tianyi Zhang, Junda Su, Aditya Desai, Oscar Wu, Zhaozhuo Xu, Anshumali Shrivastava

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive evaluations with Llama and Mistral models demonstrate that Sketch Tune outperforms leading PEFT methods across diverse tasks while using substantially smaller base models and comparable trainable parameters. As a highlight, Sketch Tune outperforms Lo RA, Do RA, and S2FT on commonsense and math benchmarks using 2.63.5 smaller base models and exceeds Loft Q in accuracy by 14.48% on GSM8K with 7.3 fewer trainable parameters.
Researcher Affiliation Collaboration 1Rice University, Houston, TX 2x MAD.ai 3University of California, Berkeley, Berkeley, CA 4Stevens Institute of Technology, Hoboken, NJ 5Third AI Corp. 6Ken Kennedy Institute.
Pseudocode Yes Algorithm 1 Learning to Sketch LLM Weights
Open Source Code Yes Our code and model checkpoints are available publicly1. 1https://github.com/Lean Models/Sketch Tune
Open Datasets Yes For math problem-solving, we fine-tune these models on the Math10K dataset and evaluate on 7 different math reasoning datasets (Hu et al., 2023). For commonsense reasoning, we fine-tune on the Commonsense170K dataset and evaluate on 8 different commonsense reasoning datasets (Hu et al., 2023). To compare Sketch Tune against efficient quantized model fine-tuning methods, we follow the settings in Li et al. (2023b) to fine-tune and test Llama-2 models on the language modeling dataset Wiki Text-2 (Merity et al., 2022) and the math reasoning dataset GSM8K (Cobbe et al., 2021).
Dataset Splits Yes The Wiki Text-2 dataset (Merity et al., 2016) consists of 44.8k training data, consisting of 36.7K training data, 3.76K validatiaon data, and 4.36K test data. Following Loft Q (Li et al., 2023b), we used the training set to perform fine-tuning, and the validataion set to evaluate fine-tuned model s performance.
Hardware Specification Yes We sketch each model using a single Quadro RTX 8000-48GB GPU. For model training, we train each model using a single NVIDIA A100-40GB GPU. All experiments are performed on an NVIDIA A100-40GB GPU.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2019)" and "Transformers library (Wolf et al., 2020)" but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We optimize Sketch Tune s hyper-parameters, including learning rate and batch size, through a parameter sweep, and we report the hyper-parameters for training in Appendix I. Appendix I contains tables with hyperparameter selections for fine-tuning Sketch Tune on various tasks, including LR, Optimizer, Batch Size, Epochs, LR Scheduler, and Warmup Steps.