reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Motion Prompt Learning for Robust Visual Tracking

Authors: Jie Zhao, Xin Chen, Yongsheng Yuan, Michael Felsberg, Dong Wang, Huchuan Lu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on seven challenging tracking benchmarks demonstrate that the proposed motion module significantly improves the robustness of vision-based trackers, with minimal training costs and negligible speed sacrifice.
Researcher Affiliation	Academia	1Dalian University of Technology 2City University of Hong Kong 3Link oping University.
Pseudocode	No	The paper describes methods using textual descriptions and figures, such as Figure 2 for the pipeline, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is available at https://github.com/ zj5559/Motion-Prompt-Tracking.
Open Datasets	Yes	We select the training splits of La SOT (Fan et al., 2019), GOT10K (Huang et al., 2019)2, and Tracking Net (Muller et al., 2018) as the training data. ... We compare our methods with baselines and other SOTA trackers on the following seven tracking benchmarks. VOT. ... VOT2018 (Kristan et al., 2018), VOT2020 (Kristan et al., 2020), and VOT2022 (Kristan et al., 2022). La SOT and La SOTEXT. La SOT (Fan et al., 2019) ... La SOTEXT (Fan et al., 2021) ... TNL2K (Wang et al., 2021). Tracking Net (Muller et al., 2018).
Dataset Splits	Yes	We select the training splits of La SOT (Fan et al., 2019), GOT10K (Huang et al., 2019)2, and Tracking Net (Muller et al., 2018) as the training data. For the motion input, we adopt Di MP-18 (Bhat et al., 2019) to generate real tracking predictions for each of training sequences, and employ reverse sampling, sparse sampling and Cut Mix (Yun et al., 2019) for data augmentations. ... Following the VOT protocol, 1k sequences are removed.
Hardware Specification	Yes	Models are trained on 2 NVIDIA A100 GPUs, and tested on a single NVIDIA RTX2080Ti GPU.
Software Dependencies	No	Our methods are implemented in Python with Py Torch.
Experiment Setup	Yes	The length of the historical trajectory T is set to 30 based on experimental results. ... The lightweight fusion decoder is implemented as a two-layer Transformer network. The weight head Head W and motion head Head M are implemented by a two-layer MLP, where the hidden size is 256. ... The model is trained for 60 epochs with 60k image pairs per epoch. We set the batch size to 128, and the learning rate is decreased by a factor of 10 after 40 epochs. The initial learning rate and other training settings are set the same as corresponding baseline trackers. ... LM = λIo ULIo U + λℓ1L1, (4) where λIo U = 2 and λℓ1 = 5 in our experiments. ... L = LTr + λM(LM + LW), (6) where λM = 1 in our experiments. ... For each layer, the number of attention heads is 8, and the hidden size of MLP is set to 1024 and 256 for OSTrack / ARTrack and Seq Track, respectively. ... the best performance of our model is attained when the probability of Cut Mix is set to 0.5. ... The sparseness of 5 is an optimized choice, which is also our default setting.