MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
Authors: Hongyu Ke, Jack Morris, Kentaro Oguchi, Xiaofei Cao, Yongkang Liu, Haoxin Wang, Yi Ding
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate Mam BEV s promising performance across diverse visual perception metrics, highlighting its advantages in input scaling efficiency compared to existing benchmark models. A thorough set of ablation studies is provided to showcase model scaling and other properties. We open-source our code 1 and provide a strong baseline and evaluation framework for future experimentation. |
| Researcher Affiliation | Collaboration | 1Georgia State University 2Info Tech Labs, Toyota Motor North America R&D EMAIL EMAIL |
| Pseudocode | Yes | A.3 ALGORITHMS The Pseudocode of our proposed Spatial Cross Mamba shows in Algorithm 1. The details of the Cross Quasi-Separable State Space Model (XQSSM) show in Algorithm 2. |
| Open Source Code | Yes | The code is available at https://github.com/amaigsu/Mam BEV. We open-source our code 1 and provide a strong baseline and evaluation framework for future experimentation. 1https://github.com/amai-gsu/Mam BEV |
| Open Datasets | Yes | We conduct our experiments using the nu Scenes dataset Caesar et al. (2020). The nu Scenes dataset is a large-sale autonomous driving dataset containing 1000 driving scenes from Boston and Singapore. |
| Dataset Splits | No | The paper mentions using the nu Scenes dataset but does not explicitly describe how the data was split into training, validation, or test sets for their experiments (e.g., specific percentages, counts, or a reference to predefined splits used by the authors). |
| Hardware Specification | Yes | We trained with an effective batch size of 32 with no gradient accumulation on 8 A100s for 30 epochs, truncated at 24 epochs. The FPS is the average number of samples per second processed by the model in evaluation mode on an RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using an Adam W optimizer and an automatic mixed precision optimizer wrapper, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We used a learning rate of 8 10 4, with a linear warmup for 10% of the scheduled steps starting from 8 3 10 4 Following the warmup, the learning rate follows an epoch based cosine annealing schedule with a minimum learning rate of 8 10 7. We trained with an effective batch size of 32 with no gradient accumulation on 8 A100s for 30 epochs, truncated at 24 epochs. Starting from step 100 an exponential moving average according to the function w t = (1 0.0002)wt + 0.0002wt is applied to all weights. An Adam W optimizer with a 0.01 weight decay is used, and training employs an automatic mixed precision optimizer wrapper with an initial gradient scaling of 512. A 0.1 multiplier is applied to the learning rate of the backbone weights and the deformable attention sampling offsets Zhu et al. (2020). |