SUTrack: Towards Simple and Unified Single Object Tracking
Authors: Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks. Moreover, we provide a range of models catering edge devices as well as high-performance GPUs, striking a good trade-off between speed and accuracy. Experiments demonstrate that our SUTrack method is effective, achieving new state-of-the-art performance across 11 benchmarks and five SOT tasks. |
| Researcher Affiliation | Collaboration | 1 Dalian University of Technology 2 Baidu Inc |
| Pseudocode | No | The paper describes algorithms and strategies like 'Soft Token Type Embedding' and 'Task-recognition Training Strategy' using detailed textual descriptions and mathematical formulas (e.g., equations 3, 4, 5, 6, 7, 8, 9, 10). However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Code github.com/chenxin-dlut/SUTrack |
| Open Datasets | Yes | Our training data comprises commonly used datasets for five SOT tasks, including COCO (Lin et al. 2014), La SOT (Fan et al. 2021), GOT-10k (Huang, Zhao, and Huang 2019), Tracking Net (Muller et al. 2018), VASTTrack (Peng et al. 2024), Depth Track (Yan et al. 2021c), Vis Event (Wang et al. 2024), Las He R (Li et al. 2021), and TNL2K (Wang et al. 2021b). |
| Dataset Splits | Yes | Our training data comprises commonly used datasets for five SOT tasks... In each batch, we sample and mix data from these datasets, with RGB data being sampled at twice the rate of multi-modal data. The template and search images are generated by expanding the target bounding boxes by factors of 2 and 4, respectively. We train the model with Adam W optimizer. Also refers to evaluation on benchmarks such as La SOT, Tracking Net, and GOT-10k, which typically use predefined standard splits. |
| Hardware Specification | Yes | Training is conducted on 4 NVIDIA A40 GPUs, while inference speed is evaluated on a single NVIDIA 2080TI GPU. ...running speeds on both the Intel Core i9-9900K @ 3.60GHz CPU and the NVIDIA Jetson AGX Xavier edge device. |
| Software Dependencies | Yes | The SUTrack models are implemented using Python 3.8 and Py Torch 1.11. |
| Experiment Setup | Yes | We train the model by mixing data from all five SOT tasks in each batch... For tracking predictions... we use a weighted focal loss for classification and a combination of ℓ1 loss and generalized Io U loss for regression. For task-recognition predictions, we use the cross-entropy loss... The overall loss function is summarized as: L = Lclass + λGLIo U + λL1LL1 + Ltask, where λG = 2 and λL1 = 5 are the regularization parameters. ...We train the model with Adam W optimizer. The model is trained for a total of 180 epochs, with 100, 000 image pairs per epoch. ... The online template update interval is set to 25, with an update confidence threshold of 0.7 by default. A Hanning window penalty is applied... |