reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Authors: Xiantao Hu, Ying Tai, Xu Zhao, Chen Zhao, Zhenyu Zhang, Jun Li, Bineng Zhong, Jian Yang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive comparisons on five benchmark datasets illustrate that STTrack achieves state-of-the-art performance across various multimodal tracking scenarios.
Researcher Affiliation	Academia	1PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology 2Nanjing University 3Guangxi Normal University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and figures (Figure 2 and Figure 3), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code https://github.com/NJU-PCALab/STTrack
Open Datasets	Yes	The proposed STTrack achieves state-of-the-art performance on five popular multimodal tracking benchmarks, including RGBT234, Las He R, Vis Ev Ent, Depthtrack, and VOT-RGBD2022.
Dataset Splits	Yes	Vis Event is the largest RGB-E dataset, encompassing 500 training video sequences and 320 testing video sequences.
Hardware Specification	Yes	The training was conducted on four NVIDIA Tesla A6000 GPUs over 15 epochs... The tracking speed, tested on a NVIDIA 4090 GPU, is approximately 35.5 FPS.
Software Dependencies	No	Adam W (Loshchilov and Hutter 2018) was employed as the optimizer, with an initial learning rate of 1e 5 for the Vi T backbone and 1e 4 for other parameters. This mentions an optimizer but does not specify software versions for programming languages, libraries, or frameworks.
Experiment Setup	Yes	The training was conducted on four NVIDIA Tesla A6000 GPUs over 15 epochs, with each epoch consisting of 60,000 sample pairs and a batch size of 32. Adam W (Loshchilov and Hutter 2018) was employed as the optimizer, with an initial learning rate of 1e 5 for the Vi T backbone and 1e 4 for other parameters. After 10 epochs, the learning rate was reduced by a factor of 10.