Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Authors: Chaodong Xiao, Minghan Li, zhengqiang ZHANG, Deyu Meng, Lei Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation. In this section, we conduct a series of experiments to compare Spatial-Mamba with leading benchmark models
Researcher Affiliation Collaboration Chaodong Xiao1,2, , Minghan Li1,3, , Zhengqiang Zhang1,2, Deyu Meng4, Lei Zhang1,2, 1The Hong Kong Polytechnic University 2OPPO Research Institute 3Harvard Medical School 4Xi an Jiaotong University
Pseudocode No The paper describes the methodology using equations (e.g., Eq. 1, 2, 3) and textual descriptions in sections like '4.1 FORMULATION OF SPATIAL-MAMBA' and '4.2 NETWORK ARCHITECTURE', but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Source codes and trained models can be found at https://github.com/EdwardChasel/Spatial-Mamba.
Open Datasets Yes We first evaluate the representation learning capabilities of Spatial-Mamba in image classification on Image Net-1K (Deng et al., 2009). We evaluate Spatial-Mamba in object detection and instance segmentation tasks using COCO 2017 dataset (Lin et al., 2014) and MMSegmenation toolkit (Contributors, 2020). To assess the performance of Spatial-Mamba on semantic segmentation task, we train our models with the widely used UPer Net segmentor (Xiao et al., 2018) and MMSegmenation toolkit (Contributors, 2020) on ADE20K dataset (Zhou et al., 2019).
Dataset Splits Yes Following previous works (Liu et al., 2021; 2024), we train three variants of Spatial-Mamba... We adopted the experimental configurations used in previous works (Liu et al., 2021; 2024)... Following common practices (Liu et al., 2021; 2024), we fine-tune the pre-trained models for 12 epochs (1 schedule) and 36 epochs with multi-scale inputs (3 schedule).
Hardware Specification Yes Throughput is measured using an A100 GPU with an input resolution of 224 224.
Software Dependencies No The paper mentions 'MMDetection library (Chen et al., 2019)' and 'MMSegmenation toolkit (Contributors, 2020)', and 'CUDA kernels', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes The Spatial-Mamba-T/S/B models are trained from scratch for 300 epochs using Adam W optimizer with betas set to (0.9, 0.999), momentum set to 0.9, and batch size set to 1024. The initial learning rate is set to 0.001 with a weight decay of 0.05. A cosine annealing learning rate schedule is adopted with a warm-up of 20 epochs. We adopt the common data augmentation strategies as in previous works (Liu et al., 2021; 2024). Moreover, label smoothing (0.1), exponential moving average (EMA) and MESA (Du et al., 2022) are also applied. The drop path rate is set to 0.2 for Spatial-Mamba-T, 0.3 for Spatial-Mamba-S and 0.5 for Spatial-Mamba-B.