reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mamba State-Space Models Are Lyapunov-Stable Learners

Authors: John Timothy Halloran, Manbir S Gulati, Paul F Roysdon

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that Mamba LLMs are extremely stable to changes introduced by combinations of MPFT and PEFT... We empirically validate these theoretical results; for a large number of randomly generated SSM layers, we show that manually adjusting initial latent and input states produces maximum deviations in the output states which exponentially decrease over discrete time. Furthermore, by expanding previous divergence performance metrics (Dettmers et al., 2022; Dettmers & Zettlemoyer, 2023; Dettmers et al., 2024) and evaluating combinations of MPFT and PEFT, we show that fine-tuned Mamba LLMs do not substantially deviate in performance compared to full-precision full fine-tuning.
Researcher Affiliation	Industry	John T. Halloran EMAIL Leidos Manbir Gulati Leidos Paul Roysdon Leidos
Pseudocode	No	The paper includes mathematical equations and theoretical proofs but does not present any pseudocode or algorithm blocks.
Open Source Code	No	The paper discusses the use of existing tools and mentions official implementations and Huggingface documentation for Mamba models. However, it does not provide an explicit statement from the authors about releasing their own source code for the methodology described in the paper, nor does it provide a direct link to a code repository for their specific work.
Open Datasets	Yes	Using the Alpaca dataset (Taori et al., 2023)... All models were evaluated using the LM evaluation harness from Eleuther AI (Gao et al., 2023). Model performance is measured as percent accuracy using the MMLU (Hendrycks et al., 2020) and Winogrande (Sakaguchi et al., 2021) datasets... The Alpaca dataset is freely available for download at https://huggingface.co/datasets/tatsu-lab/alpaca under open-source license CC-by-NC 4.0. The Open Hermes dataset is freely available for download at https://huggingface.co/datasets/teknium/OpenHermes-2.5 under open-source license MIT, Apache 2.0, CC.
Dataset Splits	No	The paper mentions using datasets for fine-tuning (Alpaca, LIMA, Open Hermes) and evaluation (MMLU, Winogrande) and specifies few-shot settings ({0, 1, 3, 5}-shot performance) for evaluation. However, it does not explicitly provide details about how the fine-tuning datasets were split into training, validation, or test sets for their experiments, nor does it cite predefined splits for these specific tasks.
Hardware Specification	Yes	Each fine-tuning run occurred on a single Nvidia A10G GPU (24 GB total memory).
Software Dependencies	Yes	All fine-tuning experiments were run using package versions Transformers 4.40.0.dev0, Accelerate 0.28.0, TRL 0.8.1, PyTorch 2.2.1+cu121, and PEFT 0.10.0. All Mamba-2 models were run using mamba-ssm v2.2.2 using Huggingface checkpoints... For MPFT, Flash Attention 2.0 (Dao et al., 2022) via flash_attn 2.5.7 was used for Pythia models.
Experiment Setup	Yes	Mamba 160M, 410M, and 790M models are fine-tuned for three epochs with a maximum sequence length of 512. ... The training recipe for all models was adapted from Tunstall et al. (2023), with the AdamW_torch optimizer and a cosine annealing schedule. ... For both Pythia and Mamba models, the learning rate and Lo RA dimension r were scaled to improve performance of smaller models (per-model values listed in Table 1). ... Training epochs used for all Alpaca and Open Hermes experiments were three and one, respectively.