Parameter-Efficient Fine-Tuning of State Space Models
Authors: Kevin Galim, Wonjun Kang, Yuchen Zeng, Hyung Il Koo, Kangwook Lee
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark six widely used PEFT methods across three categories on diverse tasks, including natural language understanding, generation, and computer vision. We evaluate these methods on both SSM-based models (i.e., Mamba) and a hybrid model (i.e., Jamba (Lieber et al., 2025)). Our results show that Lo RA consistently outperforms all other PEFT methods on both SSM-based and hybrid models. Through extensive experiments, we demonstrate that integrating SDT into SSM-based models, combined with applying Lo RA to their linear projection matrices, achieves state-of-the-art fine-tuning performance. |
| Researcher Affiliation | Collaboration | 1Furiosa AI 2Seoul National University 3University of Wisconsin-Madison. |
| Pseudocode | Yes | The resulting dimension selection approach is outlined in the pseudo-code (Alg. 1), which corresponds to the update scheme illustrated in Fig. 1. |
| Open Source Code | Yes | The roadmap of our paper is illustrated in Fig. 1. Our code is available at https://github.com/furiosa-ai/ssm-peft. |
| Open Datasets | Yes | We use six datasets spanning different domains: GLUE for natural language understanding (Wang et al., 2019), DART for RDF-to-text generation (Nan et al., 2021), SAMSum (Gliwa et al., 2019) for summarization, Spider for text-to-SQL generation (Yu et al., 2018), and two vision datasets CIFAR-10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2015) |
| Dataset Splits | Yes | The dataset characteristics, including our train, validation and test set sizes, sequence lengths, and number of epochs, are summarized in Table 5. |
| Hardware Specification | Yes | All experiments were carried out on a single H100 GPU, and the reported metrics represent averages across the four simulations. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers for replication. |
| Experiment Setup | Yes | We fine-tune pretrained Mamba and Jamba models with Adam W with a linear learning rate decay schedule. For Lo RA we set rank to 8, alpha to 8, and dropout to 0.1 for all experiments. |