Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

Authors: Wenbo Huang, Jinghui Zhang, Guang Li, Lei Zhang, Shuoyuan Wang, Fang Dong, Jiahui Jin, Takahiro Ogawa, Miki Haseyama

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Manta achieves new state-of-the-art performance on prominent benchmarks, including SSv2, Kinetics, UCF101, and HMDB51. Extensive empirical studies prove that Manta significantly improves FSAR of long subsequence from multiple perspectives.
Researcher Affiliation Academia 1Southeast University, Nanjing 211189, Jiangsu, China 2Hokkaido University, Sapporo 060-0808, Hokkaido, Japan 3Nanjing Normal University, Nanjing 210023, Jiangsu, China 4Southern University of Science and Technology, Shenzhen 518055, Guangdong, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology with architectural diagrams and mathematical formulations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/wenbohuang1002/Manta
Open Datasets Yes Widely used benchmark datasets such as temporal-related SSv2 (Goyal et al. 2017), spatial-related Kinetics (Carreira and Zisserman 2017), UCF101 (Soomro, Zamir, and Shah 2012), and HMDB51 (Kuehne et al. 2011) are selected for proving the effectiveness of Manta.
Dataset Splits Yes According to the most common data split (Zhu and Yang 2018; Cao et al. 2020; Zhang et al. 2020), all datasets are divided into Dtrain, Dval, and Dtest (Dtrain Dval Dtest = ).
Hardware Specification Yes Most experiments are completed on a server with two 32GB NVIDIA Tesla V100 PCIe GPUs.
Software Dependencies No The paper mentions using specific backbones (Res Net-50, Vi T-B, VMamba-B) and an SGD optimizer, but it does not specify software dependencies with version numbers such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes We adopt two standard few-shot settings including 5-way 1-shot and 5-shot to conduct experiments. ... Features extracted are 2048-dimensional vectors (D = 2048). ... Except for the larger SSv2 which requires 75,000 tasks training, other datasets utilize 10,000 tasks. An SGD optimizer with an initial learning rate of 10 3 is applied for training. The Dval determines hyper-parameters including multi-scale (O = {1, 2, 4}), temperature (τ = 0.07) and weight factor of loss (λ = 4).