MambaLCT: Boosting Tracking via Long-term Context State Space Model
Authors: Xiaohai Li, Bineng Zhong, Qihua Liang, Guorong Li, Zhiyi Mo, Shuxiang Song
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that long-term context information enhances the model s ability to perceive targets in complex scenarios. Mamba LCT achieves new SOTA performance on six benchmarks while maintaining real-time running speeds. ... Our method has achieved a new state-of-the-art tracking performance on six visual tracking benchmarks, including La SOT, La SOText, GOT-10K, Tracking Net, TNL2K and UAV123. ... Experiments Implementation Details ... State-of-the-art Comparisons ... Ablation Study |
| Researcher Affiliation | Academia | 1Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China 2Guangxi Colleges and Universities Key Laboratory of Intelligent Software, Wuzhou University, Wuzhou 543002, China 3Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing 100101, China bruc EMAIL, EMAIL, EMAIL EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using mathematical equations and textual descriptions of the architecture and modules (e.g., Context Mamba module, uca Encoder), but it does not include a clearly labeled pseudocode block or algorithm section. |
| Open Source Code | Yes | Code https://github.com/GXNU-Zhong Lab/Mamba LCT |
| Open Datasets | Yes | The datasets used for training are GOT10K (Huang, Zhao, and Huang 2019), La SOT (Fan et al. 2019), COCO (Lin et al. 2014), and Tracking Net (Muller et al. 2018). ... La SOT. The La SOT dataset contains 1400 high-quality video sequences... La SOText. La SOText is an extension supplement to the La SOT dataset... GOT-10K. The GOT-10k contains 10,000 high-quality video sequences... Tracking Net. The Tracking Net comprises over 30,000 video sequences... TNL2K. TNL2K is a large-scale dataset for natural language tracking... UAV123. UAV123 is a dataset for low-altitude UAV object tracking... |
| Dataset Splits | Yes | TNL2K is a large-scale dataset for natural language tracking, containing approximately 2,000 video sequences, with a training and testing split ratio of 13:7. |
| Hardware Specification | Yes | Training and testing were conducted on two NVIDIA A100 GPUs. The tracking speed test was performed on a Tesla V100. |
| Software Dependencies | Yes | Our tracker implementation is based on Python 3.8 and Pytorch 1.13.1. |
| Experiment Setup | Yes | During the training process, we set the video clip sampling length to 2 and the sampling quantity to 30000. ... The learning rate for the backbone network is set to 2e-4, while other parameters have a learning rate ten times higher. We train the model for a total of 300 epochs, and learning rate decay begins at the 240th epoch, with a decay rate set to 1e-4. The same learning rate and decay rate are applied when training on the GOT10K dataset, but the model is trained for only 150 epochs, with the decay starting at the 120th epoch. The batch size for training is set to 16. In the loss function, ̴1 = 5 and ̴2 = 2. |