Less Is More: Token Context-Aware Learning for Object Tracking
Authors: Chenlong Xu, Bineng Zhong, Qihua Liang, Yaozong Zheng, Guorong Li, Shuxiang Song
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of our tracker, achieving state-of-the-art results on tracking benchmarks such as GOT-10K, Tracking Net, and La SOT. |
| Researcher Affiliation | Academia | 1Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China 2Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical formulations (Eq. 1-6) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | The training data includes La SOT (Fan et al. 2019), GOT-10k (Huang, Zhao, and Huang 2021), Tracking Net (M uller et al. 2018), and COCO (Lin et al. 2014). |
| Dataset Splits | Yes | Following the official requirements, we only use the GOT-10k training set to train our model and evaluated the test results. Tracking Net (M uller et al. 2018)... We evaluated LMTrack384 on its test set. La SOT (Fan et al. 2019) dataset consists of 280 videos in its test set... |
| Hardware Specification | Yes | The model is conducted on a server with two 80GB Tesla A100 GPUs, using a batch size of 16, where each batch consists of four search images and one template image. |
| Software Dependencies | No | The paper mentions using 'Vi T-base (Dosovitskiy et al. 2021) model' and 'Adam W' optimizer but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We employ the Adam W to optimize the network parameters with initial learning rate of 4 10 5 for the backbone, 4 10 4 for the rest, and set the weight decay to 10 4. We set the training epochs to 300 epochs. 60,000 search images are randomly sampled in each epoch. The learning rate drops by a factor of 10 after 240 epochs. using a batch size of 16. where λiou = 2 and λL1 = 5. |