Cross-modulated Attention Transformer for RGBT Tracking
Authors: Yun Xiao, Jiacong Zhao, Andong Lu, Chenglong Li, Bing Yin, Yin Lin, Cong Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on five public RGBT tracking benchmarks show the outstanding performance of the proposed CAFormer against state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1 School of Artificial Intelligence, Anhui University, Hefei, China ... 3 i FLYTEK CO.LTD., Hefei, China |
| Pseudocode | No | The paper describes the method using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/opacity-black/CAFormer |
| Open Datasets | Yes | Experiments on five public RGBT tracking benchmarks... Our experiments are conducted on five public datasets: GTOT (Li et al. 2016), RGBT210 (Li et al. 2017), RGBT234 (Li et al. 2019a), Las He R (Li et al. 2021), and VTUAV (Pengyu et al. 2022). |
| Dataset Splits | Yes | We train our model for 10 epochs on the training set of Las He R (Li et al. 2021)... For GTOT (Li et al. 2016), RGBT210 (Li et al. 2017), and RGBT234 (Li et al. 2019a), we directly evaluate our model without any further fine-tuning. For VTUAV (Pengyu et al. 2022) dataset, we adopt the VTUAV training set for our training process, and adjust the number of training epochs to 5. |
| Hardware Specification | Yes | For the training process, CAFormer is trained on 2 NVIDIA 2080ti GPUs... Additionally, we complete the speed test on a device with an Nvidia RTX 3080ti GPU. |
| Software Dependencies | No | The paper mentions the use of 'Adam W (Loshchilov and Hutter 2017)' as the optimization algorithm, but does not specify versions for other key software components like programming languages or libraries. |
| Experiment Setup | Yes | In our method, the proposed CAFormer block is integrated into the last 3 layers of the backbone, and the CTE strategy is adopted at layers 3,6 and 9. The search regions are resized to 256 256, while the templates are resized to 128 128. For the training process, CAFormer is trained on 2 NVIDIA 2080ti GPUs with a global batch size of 32. We set the learning rates of the backbone network and other parameters to 5e-6 and 5e-5, respectively. The optimization algorithm employed is Adam W (Loshchilov and Hutter 2017) with a weight decay of 1e-4. We train our model for 10 epochs on the training set of Las He R... For VTUAV (Pengyu et al. 2022) dataset, we adopt the VTUAV training set for our training process, and adjust the number of training epochs to 5. Following previous work (Hui et al. 2023), all experiments in this paper are loaded with pre-trained weights from the public SOT method (Ye et al. 2022). |