Cross-View Referring Multi-Object Tracking
Authors: Sijia Chen, En Yu, Wenbing Tao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the CRTrack benchmark verify the effectiveness of our method. [...] In summary, our main contributions are as follows: [...] 3. We propose an end-to-end cross-view referring multi-object tracking method, called CRTracker. We evaluate CRTracker and other methods on the CRTrack benchmark both in-domain and cross-domain. The evaluation results show that CRTracker achieves state-of-the-art performance, fully demonstrating its effectiveness. [...] Experiments For evaluation, we conduct experiments on the CRTrack benchmark we constructed and follow its evaluation metrics. [...] Ablation Study To study the role of each part of our method CRTracker, we conduct ablation experiments on the CRTrack benchmark. |
| Researcher Affiliation | Academia | Sijia Chen, En Yu, Wenbing Tao* Huazhong University of Science and Technology EMAIL |
| Pseudocode | Yes | Algorithm 1: Prediction Module Input: Frame-to-frame association results, i.e. input tracks of the prediction module Tinput; fusion scores Sf Parameter: Fusion scores of views where the track exists S; fusion score of the track Sf; j-th view Vj; Number of views for the track NV ; threshold of average fusion score Tas; threshold of single-view fusion score Tss; threshold of hit score Ths; hit score of the track SH Ti; average hit score s1; single-view hit score s2; single-view miss score s3 Output: Output tracks of the prediction module Toutput |
| Open Source Code | Yes | Dataset, Code https://github.com/chen-si-jia/CRMOT |
| Open Datasets | Yes | To advance CRMOT task, we construct a cross-view referring multi-object tracking benchmark based on CAMPUS and DIVOTrack datasets, named CRTrack. [...] These sequence scenes come from two cross-view multi-object datasets, DIVOTrack (Hao et al. 2024) and CAMPUS (Xu et al. 2016). |
| Dataset Splits | Yes | Dataset Split. For the DIVOTrack dataset with language descriptions, we evenly selected three scenes as the in-domain test set based on the scene s object density, and the remaining seven scenes as the training set. The CAMPUS dataset with language descriptions is used as the cross-domain test set. In short, the CRTrack benchmark is divided into training set, in-domain test set and cross-domain test set. Specifically, the training set contains Floor , Gate1 , Ground , Moving , Park , Shop and Square scenes, the in-domain test set contains Circle , Gate2 and Side scenes, and the cross-domain test set contains Garden1 , Garden2 and Parking Lot scenes. |
| Hardware Specification | Yes | Our models are trained for 20 epochs and tested on a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions software components like "Swin Transformer (Liu et al. 2021)", "BERT (Devlin et al. 2018)", "Center Net (Zhou, Wang, and Kr ahenb uhl 2019)", and "Adam optimizer (Kingma and Ba 2014)", but it does not specify version numbers for any ancillary software or programming languages used for implementation (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Our models are trained for 20 epochs and tested on a single NVIDIA RTX 3090 GPU. The feature dimensions of single-view embedding, cross-view embedding, and full embedding are all set to 512. During the training phase, we use the Adam optimizer (Kingma and Ba 2014), the initial learning rate is set to 1 10 4, the batchsize to 12, and the feature fusion weight α in Formula (5) to 0.01. During the inference phase, we set the score fusion weight β in Formula (9) is set to 0.1, threshold of average fusion score Tas to 0.5, threshold of single-view fusion score Tss to 0.75, threshold of hit score Ths to 30, average hit score s1 to 3, single-view hit score s2 to 3, and single-view miss score s3 to 1. |