H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Authors: Siran Chen, Yuxiao Luo, Yue Ma, Yu Qiao, Yali Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct our H-MBA on multi-modal video understanding benchmarks in autonomous driving, including DRAMA (Malla et al. 2023) and BDD-X (Kim et al. 2018). The extensive results show that, our H-MBA achieves the state-of-the-art performance, e.g., it gets 66.9% m Io U on risk localization, with 5.5% improvement compared with the previous SOTA approach (Malla et al. 2023). |
| Researcher Affiliation | Academia | 1 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 2 School of Artificial Intelligence, University of Chinese Academy of Science, Beijing, China 3 Shanghai Artificial Intelligence Laboratory, Shanghai, China 4 The Hong Kong University of Science and Technology, Hong Kong, China 5 The Hong Kong Polytechnic University, Hong Kong, China |
| Pseudocode | No | The paper describes the proposed H-MBA framework, C-Mamba, and Q-Mamba modules using textual descriptions and mathematical formulas (equations 1-6), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We conduct our H-MBA on multi-modal video understanding benchmarks in autonomous driving, including DRAMA (Malla et al. 2023) and BDD-X (Kim et al. 2018). |
| Dataset Splits | No | The paper describes the total size of the datasets and frame sampling strategy (L=5 for DRAMA, L=8 for BDD-X) but does not provide specific training, validation, and test dataset split percentages or counts needed for reproduction. |
| Hardware Specification | Yes | All the experiments are done with 4 A6000 GPUs |
| Software Dependencies | No | The paper mentions using Shikra, CLIP Vi T-L/14, and Vicuna-7/13B but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | we train the model for 5 epochs with 2e 5 learning rate in cosine annealing schedule (Loshchilov and Hutter 2016). |