Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Authors: Jiawen Zhu, Huayi Tang, Xin Chen, Xinying Wang, Dong Wang, Huchuan Lu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Asym Track offers superior speed-precision tradeoffs across different platforms compared to the current stateof-the-arts. For instance, Asym Track-T achieves 60.8% AUC on La SOT and 224/81/84 FPS on GPU/CPU/AGX, surpassing Hi T-Tiny by 6.0% AUC with higher speeds. |
| Researcher Affiliation | Academia | 1Dalian University of Technology, Dalian, China 2University of Pennsylvania, Philadelphia, USA EMAIL, EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using textual descriptions and block diagrams (e.g., Figure 2, Figure 3, Figure 4) but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | Yes | We used training splits of four datasets for training, including La SOT (Fan et al. 2019), Tracking Net (Muller et al. 2018), COCO2017 (Lin et al. 2014), and GOT10K (Huang, Zhao, and Huang 2019). La SOT (Fan et al. 2019) is a large-scale long-term benchmark comprising 1,400 video sequences. Tracking Net (Muller et al. 2018) is a largescale short-term benchmark with 511 test video sequences. GOT-10k (Huang, Zhao, and Huang 2019) is a large-scale tracking dataset with over 10,000 video sequences. NFS (Kiani Galoogahi et al. 2017) is a high-frame-rate dataset focused on fast-motion object scenarios. UAV123 (Mueller, Smith, and Ghanem 2016) aims to focus on the challenges unique to UAV-based tracking. La SOText (Fan et al. 2021), an extension of La SOT for more challenging tracking evaluations. VOT2021 (Kristan et al. 2021) challenge benchmark. |
| Dataset Splits | Yes | We used training splits of four datasets for training, including La SOT (Fan et al. 2019), Tracking Net (Muller et al. 2018), COCO2017 (Lin et al. 2014), and GOT10K (Huang, Zhao, and Huang 2019). La SOT... with 280 sequences reserved for testing. |
| Hardware Specification | Yes | Tab. 1 details their configurations, also including parameters, FLOPs, and inference speeds across different platforms: GPU (Nvidia 2080ti), CPU (Intel i7-9700KF@3.6G Hz), and edge device (Jetson AGX Xavier). We trained the model for 500 epochs using the Adam W (Loshchilov and Hutter 2018) optimizer with an initial learning rate of 4e-4 on 2 NVIDIA A800 GPUs |
| Software Dependencies | No | We trained the model for 500 epochs using the Adam W (Loshchilov and Hutter 2018) optimizer... No specific version numbers for other software libraries, frameworks (like PyTorch or TensorFlow), or programming languages are provided. |
| Experiment Setup | Yes | The template and search region images are resized to 128 128 and 256 256 for Asym Track-T and Asym Track-S, and to 192 192 and 384 384 for Asym Track-B. We trained the model for 500 epochs using the Adam W (Loshchilov and Hutter 2018) optimizer with an initial learning rate of 4e-4 on 2 NVIDIA A800 GPUs, with each epoch consisting of 60,000 randomly sampled image pairs. The training objective consists of a L1 loss and GIo U loss (Rezatofighi et al. 2019) LG, L = λ1L1(B, Bgt) + λGLG(B, Bgt), where Bgt is the ground truth, λ1 = 5 and λG = 2. |