Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking

Authors: Mattia Segu, Luigi Piccinelli, Siyuan Li, Yung-Hsu Yang, Luc Van Gool, Bernt Schiele

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental results to validate Samba MOTR. We describe our evaluation protocol (Sec. 5.1) and report implementation details (Sec. 5.2). We then compare Samba MOTR to the previous state-of-the-art methods (Sec. 5.3) and conduct an ablation study (Sec. 5.4) on the method components.
Researcher Affiliation Academia 1 ETH Zurich, 2 INSAIT, Sofia University, St. Kliment Ohridski, 3 Max Planck Institute for Informatics, Saarland Informatics Campus. All listed institutions are academic research institutions.
Pseudocode No The paper describes methods and processes in narrative text and uses figures to illustrate the architecture (Figure 2) and synchronized SSMs (Figure 3), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper mentions "https://sambamotr.github.io/" which is a project demonstration page, and "https://anonymous-samba.github.io/" as an anonymous project page for review. Neither explicitly states that the source code for the methodology described in the paper is released at these locations, nor are they explicitly described as code repositories.
Open Datasets Yes We validate Samba MOTR on the challenging Dance Track (Sun et al., 2022), Sports MOT (Cui et al., 2023), and BFT (Zheng et al., 2024) datasets. Owing to our contributions, we establish a new state of the art on all datasets. ... Additionally, we introduce an effective technique for dealing with uncertain observations (Mask Obs) and an efficient training recipe to scale Samba MOTR to longer sequences. By modeling long-range dependencies and interactions among tracked objects, Samba MOTR implicitly learns to track objects accurately through occlusions without any handcrafted heuristics. Our approach significantly surpasses prior state-of-the-art on the Dance Track, BFT, and Sports MOT datasets.
Dataset Splits Yes On Dance Track (Sun et al., 2022), we train Samba MOTR for 15 epochs on the training set and drop the learning rate by a factor of 10 at the 10th epoch. On BFT (Sun et al., 2022), we train for 20 epochs and drop the learning rate after 10 epochs. On Sports MOT (Cui et al., 2023), we train for 18 epochs and drop the learning rate after 8 and 12 epochs.
Hardware Specification Yes We run our experiments on 8 NVIDIA RTX 4090 GPUs, with batch size 1 per GPU.
Software Dependencies No The paper mentions using Deformable-DETR (Zhu et al., 2020) and Res Net-50 (He et al., 2016) as architectures, and the Adam W optimizer (Loshchilov & Hutter, 2017), but does not specify any software libraries with version numbers (e.g., PyTorch 1.x, CUDA 11.x, Python 3.x).
Experiment Setup Yes Each batch element contains a video clip with 10 frames, and we compute and backpropagate the gradients only over the last 5. We sample uniformly spaced frames at random intervals from 1 to 10 within each clip. We utilize the Adam W optimizer (Loshchilov & Hutter, 2017) with initial learning rate of 2.0 10 4. For simplicity, τdet = τtrack = τmask = 0.5. Nmiss is 35, 20, and 50 on Dance Track, BFT, and Sports MOT, respectively, due to different dataset dynamics. On Dance Track (Sun et al., 2022), we train Samba MOTR for 15 epochs on the training set and drop the learning rate by a factor of 10 at the 10th epoch. On BFT (Sun et al., 2022), we train for 20 epochs and drop the learning rate after 10 epochs. On Sports MOT (Cui et al., 2023), we train for 18 epochs and drop the learning rate after 8 and 12 epochs.