Mamba State-Space Models Are Lyapunov-Stable Learners
Authors: John Timothy Halloran, Manbir S Gulati, Paul F Roysdon
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that Mamba LLMs are extremely stable to changes introduced by combinations of MPFT and PEFT... We empirically validate these theoretical results; for a large number of randomly generated SSM layers, we show that manually adjusting initial latent and input states produces maximum deviations in the output states which exponentially decrease over discrete time. Furthermore, by expanding previous divergence performance metrics (Dettmers et al., 2022; Dettmers & Zettlemoyer, 2023; Dettmers et al., 2024) and evaluating combinations of MPFT and PEFT, we show that fine-tuned Mamba LLMs do not substantially deviate in performance compared to full-precision full fine-tuning. |
| Researcher Affiliation | Industry | John T. Halloran EMAIL Leidos Manbir Gulati Leidos Paul Roysdon Leidos |
| Pseudocode | No | The paper includes mathematical equations and theoretical proofs but does not present any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper discusses the use of existing tools and mentions official implementations and Huggingface documentation for Mamba models. However, it does not provide an explicit statement from the authors about releasing their own source code for the methodology described in the paper, nor does it provide a direct link to a code repository for their specific work. |
| Open Datasets | Yes | Using the Alpaca dataset (Taori et al., 2023)... All models were evaluated using the LM evaluation harness from Eleuther AI (Gao et al., 2023). Model performance is measured as percent accuracy using the MMLU (Hendrycks et al., 2020) and Winogrande (Sakaguchi et al., 2021) datasets... The Alpaca dataset is freely available for download at https://huggingface.co/datasets/tatsu-lab/alpaca under open-source license CC-by-NC 4.0. The Open Hermes dataset is freely available for download at https://huggingface.co/datasets/teknium/OpenHermes-2.5 under open-source license MIT, Apache 2.0, CC. |
| Dataset Splits | No | The paper mentions using datasets for fine-tuning (Alpaca, LIMA, Open Hermes) and evaluation (MMLU, Winogrande) and specifies few-shot settings ({0, 1, 3, 5}-shot performance) for evaluation. However, it does not explicitly provide details about how the fine-tuning datasets were split into training, validation, or test sets for their experiments, nor does it cite predefined splits for these specific tasks. |
| Hardware Specification | Yes | Each fine-tuning run occurred on a single Nvidia A10G GPU (24 GB total memory). |
| Software Dependencies | Yes | All fine-tuning experiments were run using package versions Transformers 4.40.0.dev0, Accelerate 0.28.0, TRL 0.8.1, PyTorch 2.2.1+cu121, and PEFT 0.10.0. All Mamba-2 models were run using mamba-ssm v2.2.2 using Huggingface checkpoints... For MPFT, Flash Attention 2.0 (Dao et al., 2022) via flash_attn 2.5.7 was used for Pythia models. |
| Experiment Setup | Yes | Mamba 160M, 410M, and 790M models are fine-tuned for three epochs with a maximum sequence length of 512. ... The training recipe for all models was adapted from Tunstall et al. (2023), with the AdamW_torch optimizer and a cosine annealing schedule. ... For both Pythia and Mamba models, the learning rate and Lo RA dimension r were scaled to improve performance of smaller models (per-model values listed in Table 1). ... Training epochs used for all Alpaca and Open Hermes experiments were three and one, respectively. |