SUTrack: Towards Simple and Unified Single Object Tracking

Authors: Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks. Moreover, we provide a range of models catering edge devices as well as high-performance GPUs, striking a good trade-off between speed and accuracy. Experiments demonstrate that our SUTrack method is effective, achieving new state-of-the-art performance across 11 benchmarks and five SOT tasks.
Researcher Affiliation Collaboration 1 Dalian University of Technology 2 Baidu Inc
Pseudocode No The paper describes algorithms and strategies like 'Soft Token Type Embedding' and 'Task-recognition Training Strategy' using detailed textual descriptions and mathematical formulas (e.g., equations 3, 4, 5, 6, 7, 8, 9, 10). However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Code github.com/chenxin-dlut/SUTrack
Open Datasets Yes Our training data comprises commonly used datasets for five SOT tasks, including COCO (Lin et al. 2014), La SOT (Fan et al. 2021), GOT-10k (Huang, Zhao, and Huang 2019), Tracking Net (Muller et al. 2018), VASTTrack (Peng et al. 2024), Depth Track (Yan et al. 2021c), Vis Event (Wang et al. 2024), Las He R (Li et al. 2021), and TNL2K (Wang et al. 2021b).
Dataset Splits Yes Our training data comprises commonly used datasets for five SOT tasks... In each batch, we sample and mix data from these datasets, with RGB data being sampled at twice the rate of multi-modal data. The template and search images are generated by expanding the target bounding boxes by factors of 2 and 4, respectively. We train the model with Adam W optimizer. Also refers to evaluation on benchmarks such as La SOT, Tracking Net, and GOT-10k, which typically use predefined standard splits.
Hardware Specification Yes Training is conducted on 4 NVIDIA A40 GPUs, while inference speed is evaluated on a single NVIDIA 2080TI GPU. ...running speeds on both the Intel Core i9-9900K @ 3.60GHz CPU and the NVIDIA Jetson AGX Xavier edge device.
Software Dependencies Yes The SUTrack models are implemented using Python 3.8 and Py Torch 1.11.
Experiment Setup Yes We train the model by mixing data from all five SOT tasks in each batch... For tracking predictions... we use a weighted focal loss for classification and a combination of ℓ1 loss and generalized Io U loss for regression. For task-recognition predictions, we use the cross-entropy loss... The overall loss function is summarized as: L = Lclass + λGLIo U + λL1LL1 + Ltask, where λG = 2 and λL1 = 5 are the regularization parameters. ...We train the model with Adam W optimizer. The model is trained for a total of 180 epochs, with 100, 000 image pairs per epoch. ... The online template update interval is set to 25, with an update confidence threshold of 0.7 by default. A Hanning window penalty is applied...