Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking

Authors: Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang, Huchuan Lu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Asym Track offers superior speed-precision tradeoffs across different platforms compared to the current stateof-the-arts. For instance, Asym Track-T achieves 60.8% AUC on La SOT and 224/81/84 FPS on GPU/CPU/AGX, surpassing Hi T-Tiny by 6.0% AUC with higher speeds.
Researcher Affiliation Academia 1Dalian University of Technology, Dalian, China 2University of Pennsylvania, Philadelphia, USA EMAIL, EMAIL EMAIL, EMAIL
Pseudocode No The paper describes methods using textual descriptions and block diagrams (e.g., Figure 2, Figure 3, Figure 4) but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes We used training splits of four datasets for training, including La SOT (Fan et al. 2019), Tracking Net (Muller et al. 2018), COCO2017 (Lin et al. 2014), and GOT10K (Huang, Zhao, and Huang 2019). La SOT (Fan et al. 2019) is a large-scale long-term benchmark comprising 1,400 video sequences. Tracking Net (Muller et al. 2018) is a largescale short-term benchmark with 511 test video sequences. GOT-10k (Huang, Zhao, and Huang 2019) is a large-scale tracking dataset with over 10,000 video sequences. NFS (Kiani Galoogahi et al. 2017) is a high-frame-rate dataset focused on fast-motion object scenarios. UAV123 (Mueller, Smith, and Ghanem 2016) aims to focus on the challenges unique to UAV-based tracking. La SOText (Fan et al. 2021), an extension of La SOT for more challenging tracking evaluations. VOT2021 (Kristan et al. 2021) challenge benchmark.
Dataset Splits Yes We used training splits of four datasets for training, including La SOT (Fan et al. 2019), Tracking Net (Muller et al. 2018), COCO2017 (Lin et al. 2014), and GOT10K (Huang, Zhao, and Huang 2019). La SOT... with 280 sequences reserved for testing.
Hardware Specification Yes Tab. 1 details their configurations, also including parameters, FLOPs, and inference speeds across different platforms: GPU (Nvidia 2080ti), CPU (Intel i7-9700KF@3.6G Hz), and edge device (Jetson AGX Xavier). We trained the model for 500 epochs using the Adam W (Loshchilov and Hutter 2018) optimizer with an initial learning rate of 4e-4 on 2 NVIDIA A800 GPUs
Software Dependencies No We trained the model for 500 epochs using the Adam W (Loshchilov and Hutter 2018) optimizer... No specific version numbers for other software libraries, frameworks (like PyTorch or TensorFlow), or programming languages are provided.
Experiment Setup Yes The template and search region images are resized to 128 128 and 256 256 for Asym Track-T and Asym Track-S, and to 192 192 and 384 384 for Asym Track-B. We trained the model for 500 epochs using the Adam W (Loshchilov and Hutter 2018) optimizer with an initial learning rate of 4e-4 on 2 NVIDIA A800 GPUs, with each epoch consisting of 60,000 randomly sampled image pairs. The training objective consists of a L1 loss and GIo U loss (Rezatofighi et al. 2019) LG, L = λ1L1(B, Bgt) + λGLG(B, Bgt), where Bgt is the ground truth, λ1 = 5 and λG = 2.