Less Is More: Token Context-Aware Learning for Object Tracking

Authors: Chenlong Xu, Bineng Zhong, Qihua Liang, Yaozong Zheng, Guorong Li, Shuxiang Song

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of our tracker, achieving state-of-the-art results on tracking benchmarks such as GOT-10K, Tracking Net, and La SOT.
Researcher Affiliation Academia 1Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China 2Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical formulations (Eq. 1-6) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes The training data includes La SOT (Fan et al. 2019), GOT-10k (Huang, Zhao, and Huang 2021), Tracking Net (M uller et al. 2018), and COCO (Lin et al. 2014).
Dataset Splits Yes Following the official requirements, we only use the GOT-10k training set to train our model and evaluated the test results. Tracking Net (M uller et al. 2018)... We evaluated LMTrack384 on its test set. La SOT (Fan et al. 2019) dataset consists of 280 videos in its test set...
Hardware Specification Yes The model is conducted on a server with two 80GB Tesla A100 GPUs, using a batch size of 16, where each batch consists of four search images and one template image.
Software Dependencies No The paper mentions using 'Vi T-base (Dosovitskiy et al. 2021) model' and 'Adam W' optimizer but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We employ the Adam W to optimize the network parameters with initial learning rate of 4 10 5 for the backbone, 4 10 4 for the rest, and set the weight decay to 10 4. We set the training epochs to 300 epochs. 60,000 search images are randomly sampled in each epoch. The learning rate drops by a factor of 10 after 240 epochs. using a batch size of 16. where λiou = 2 and λL1 = 5.