Event2Tracking: Reconstructing Multi-Agent Soccer Trajectories Using Long-Term Multimodal Context

Authors: Harry Hughes, Michael Horton, Xinyu Wei, Harshala Gammulle, Clinton Fookes, Sridha Sridharan, Patrick Lucey

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method empirically using a reconstruction loss metric. Compared to state-of-the-art approaches, our method substantially improves the accuracy of the ball s and players reconstructed trajectories.
Researcher Affiliation Collaboration Harry Hughes1,2, Michael Horton1, Xinyu Wei1, Harshala Gammulle2, Clinton Fookes2, Sridha Sridharan2, Patrick Lucey1 1Stats Perform 2Queensland University of Technology
Pseudocode No The paper describes the Event2Tracking model architecture and its components (Temporal Localisation, Event Encoder, Tracking Decoder) and illustrates it with Figure 3. However, it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets No A large dataset was used in the experiments, with 700 professional soccer games for training and 52 games for evaluation. Each game had a paired dataset of event data mevent, broadcast tracking mbroadcast, and in-venue tracking min-venue. The ball s trajectory is considered to be fully occluded in the broadcast tracking dataset, representing the challenges in tracking the ball continuously and accurately throughout the entire game using a heavily occluded monocular camera. An inventory and justification of this dataset is provided in Appendix 1.
Dataset Splits Yes A large dataset was used in the experiments, with 700 professional soccer games for training and 52 games for evaluation.
Hardware Specification Yes All models were trained for 16 hours on a a cluster of 4 A10 GPUs with a learning rate of 1e 4 using the Adam optimiser (Kingma and Ba 2014) (with default exponential decay parameters).
Software Dependencies No The paper mentions using the Adam optimiser, but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes All models were trained separately using 10, 20, 30, 45, and 60 second context windows to quantify how each approach generalised to longer trajectories. The broadcast and in-venue tracking streams were downsampled to 5Hz. Each attention module uses a hidden dimensionality of 128, and a feedforward dimensionality of 512 and 4 attention heads. For the Event2Tracking module, the event encoder and tracking decoder each have N = 4 layers. During training, the loss incurred in prediction the ball location was weighted by a factor of 11, reflecting the ball s relative importance. All models were trained for 16 hours on a a cluster of 4 A10 GPUs with a learning rate of 1e 4 using the Adam optimiser (Kingma and Ba 2014) (with default exponential decay parameters).