Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence
Authors: Wenbo Huang, Jinghui Zhang, Guang Li, Lei Zhang, Shuoyuan Wang, Fang Dong, Jiahui Jin, Takahiro Ogawa, Miki Haseyama
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Manta achieves new state-of-the-art performance on prominent benchmarks, including SSv2, Kinetics, UCF101, and HMDB51. Extensive empirical studies prove that Manta significantly improves FSAR of long subsequence from multiple perspectives. |
| Researcher Affiliation | Academia | 1Southeast University, Nanjing 211189, Jiangsu, China 2Hokkaido University, Sapporo 060-0808, Hokkaido, Japan 3Nanjing Normal University, Nanjing 210023, Jiangsu, China 4Southern University of Science and Technology, Shenzhen 518055, Guangdong, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology with architectural diagrams and mathematical formulations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/wenbohuang1002/Manta |
| Open Datasets | Yes | Widely used benchmark datasets such as temporal-related SSv2 (Goyal et al. 2017), spatial-related Kinetics (Carreira and Zisserman 2017), UCF101 (Soomro, Zamir, and Shah 2012), and HMDB51 (Kuehne et al. 2011) are selected for proving the effectiveness of Manta. |
| Dataset Splits | Yes | According to the most common data split (Zhu and Yang 2018; Cao et al. 2020; Zhang et al. 2020), all datasets are divided into Dtrain, Dval, and Dtest (Dtrain Dval Dtest = ). |
| Hardware Specification | Yes | Most experiments are completed on a server with two 32GB NVIDIA Tesla V100 PCIe GPUs. |
| Software Dependencies | No | The paper mentions using specific backbones (Res Net-50, Vi T-B, VMamba-B) and an SGD optimizer, but it does not specify software dependencies with version numbers such as Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | We adopt two standard few-shot settings including 5-way 1-shot and 5-shot to conduct experiments. ... Features extracted are 2048-dimensional vectors (D = 2048). ... Except for the larger SSv2 which requires 75,000 tasks training, other datasets utilize 10,000 tasks. An SGD optimizer with an initial learning rate of 10 3 is applied for training. The Dval determines hyper-parameters including multi-scale (O = {1, 2, 4}), temperature (τ = 0.07) and weight factor of loss (λ = 4). |