Temporal Coherent Object Flow for Multi-Object Tracking

Authors: Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on several widely used benchmarks demonstrate the superior performance of our approach. Extensive experiments on several challenging datasets such as MOT17 (Milan et al. 2016), MOT20 (Dendorfer et al. 2020), Dance Track (Sun et al. 2022) and KITTI (Geiger, Lenz, and Urtasun 2012) exhibit state-of-the-art performance of our approach. In this section, we verify the individual contributions in the ablation study and present the tracking evaluation on several challenging benchmarks, including MOT17 (Milan et al. 2016), MOT20 (Dendorfer et al. 2020), Dance Track (Sun et al. 2022) and KITTI (Geiger, Lenz, and Urtasun 2012).
Researcher Affiliation Academia 1Huazhong University of Science and Technology, Wuhan, China 2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 3La Trobe University, Melbourne, Australia EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using architectural diagrams and textual explanations, but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The text discusses utilizing third-party tools like YOLOX and a Re ID part, but it does not contain any explicit statement about releasing the source code for the OFTrack methodology described in this paper, nor does it provide a link to a code repository.
Open Datasets Yes Extensive experiments on several challenging datasets such as MOT17 (Milan et al. 2016), MOT20 (Dendorfer et al. 2020), Dance Track (Sun et al. 2022) and KITTI (Geiger, Lenz, and Urtasun 2012) exhibit state-of-the-art performance of our approach.
Dataset Splits Yes We ablate our approach using the MOT17 dataset. We split the MOT17 train set into train-half set and val-half set as in Byte Track, all of the ablation experiments are trained on train-half and tested on val-half. For MOT17 and MOT20 that only consist of pedestrians, we adopt the pretrained YOLOX detector from Byte Track. For KITTI that are driving scenarios, we adopt the COCOpretrained YOLOX(Ge et al. 2021) and use the KITTI training set to train the model. Dance Track is a challenging dataset with highly non-linear motion, and we adopted the same training method as KITTI to train our model.
Hardware Specification Yes We train our model on 4 Nvidia Tesla V100 GPUs for a total of 80 epochs. The mini-batch size is set to 16 with each GPU hosting 4 batches. The computation is completed in the HPC Platform of Huazhong University of Science and Technology.
Software Dependencies Yes Our approach is implemented in Python 3.8 with Py Torch 1.10.
Experiment Setup Yes The training samples are directly sampled from the same sequence within the interval length of 6. The size of an input image is resized to 1440 800. The flow head parameters are initialized with Xavier Uniform. The Adam W (Loshchilov and Hutter 2018) optimizer is employed with an initial learning rate of 1e-4 and the learning rate decreases according to the cosine function with the final decrease factor of 0.1. We adopt a warm-up learning rate 1e-5 with a 0.2 warm-up factor on the first 5 epochs. We train our model on 4 Nvidia Tesla V100 GPUs for a total of 80 epochs. The mini-batch size is set to 16 with each GPU hosting 4 batches. Considering the distinct characteristics of different datasets, where some datasets exhibit linear trajectories (e.g., MOT17), while others demonstrate nonlinear trajectories (e.g., Dance Track), we establish a learnable loss weights ˆα and set the initial weights to 0.5 for linear trajectories and 0.1 for nonlinear trajectories.