MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

Authors: Yuxuan Yue, Xing Hu, Dawei Yang, Zhihang Yuan, Zixu Jiang, Zhixuan Chen, Jiangyong Yu, XUCHEN, Sifan Zhou

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Mamba Quant can quantize both weights and activations into 8-bit with less than 1% accuracy loss for Mamba-based vision and language tasks.
Researcher Affiliation Collaboration 1Houmo AI 2Harbin Institute of Technology (Shenzhen) 3Nanjing University 4Southeast University
Pseudocode No The paper includes architectural diagrams, mathematical formulations, and textual descriptions of methods, but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes As a pioneering study on quantization within the Mamba family, we have published the code in the hope of promoting further research and facilitating advancements in this field.
Open Datasets Yes For vision tasks, we tested the model on the image classification dataset Image Net (Russakovsky et al., 2015) and the video classification dataset UCF-101 (Soomro et al., 2012). In the language domain, we conducted evaluations on five standard datasets: ARC-E (Boratko et al., 2018), ARC-C (Clark et al., 2018), PIQA (Bisk et al., 2020), Winogrande (Sakaguchi et al., 2021), and Hella Swag (Zellers et al., 2019).
Dataset Splits Yes The calibration data for image classification was randomly sampled from 128 images in the Image Net (Russakovsky et al., 2015) test set, while for video classification, we used samples from the UCF-101 (Soomro et al., 2012) test set for calibration.
Hardware Specification No The paper discusses the parameter count and computational load of the Mamba-2.8b model but does not specify the hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies No The paper mentions "PyTorch (Py Torch, 2023)" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup No The paper mentions using static and dynamic quantization approaches and the size of calibration data (128 images) but does not provide specific hyperparameters like learning rates, batch sizes, number of epochs, or optimizer settings for the experiments.