Exploring Enhanced Contextual Information for Video-Level Object Tracking
Authors: Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on La SOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance. |
| Researcher Affiliation | Collaboration | 1 Dalian University of Technology 2 Ningbo Institute of Dalian University of Technology 3 Baidu Inc. |
| Pseudocode | No | The paper describes the model architecture and components (CIF block, Mamba layer) verbally and through mathematical equations (2, 3, 4, 5), and provides diagrams (Figure 2, Figure 3), but it does not include a clearly labeled pseudocode or algorithm block with structured steps. |
| Open Source Code | Yes | Code https://github.com/kangben258/MCITrack |
| Open Datasets | Yes | Our training data includes La SOT (Fan et al. 2019), GOT-10k (Huang, Zhao, and Huang 2019), Tracking Net (Muller et al. 2018), COCO (Lin et al. 2014), and Vast Track (Peng et al. 2024). |
| Dataset Splits | Yes | La SOT (Fan et al. 2019) is a large-scale, long-term dataset with 1120 training videos and 208 test videos. ...GOT-10k (Huang, Zhao, and Huang 2019) test set contains 180 videos covering various common tracking challenges. In line with official guidelines, we only use the GOT-10k training set for model training. |
| Hardware Specification | Yes | The speed is measured on an Intel Core i7-8700K CPU @3.70GHz with 47GB RAM and a single 2080 Ti GPU. ...Training is performed on two 80GB Tesla A800 GPUs with a total batch size of 128. |
| Software Dependencies | Yes | All the models are implemented with Python 3.11 and Py Torch 2.1.2. |
| Experiment Setup | Yes | We use the Adam W (Loshchilov and Hutter 2018) optimizer with an initial learning rate of 4 10 5 for the backbone and 4 10 4 for the rest. The weight decay is set to 1 10 4. The model is trained for a total of 300 epochs with 60k samples per epoch, and the learning rate decreases by a factor of 10 after 240 epochs. ...Our objective function includes classification loss, focal loss (Lin et al. 2017), and regression loss, which consists of L1 loss and GIo U loss (Rezatofighi et al. 2019). ...L is the total loss. λc, λl and λg are hyperparameters, with default values λc = 1, λl = 5 and λg = 2. |