ASTRA: A Scene-aware Transformer-based Model for Trajectory Prediction

Authors: Izzeddin Teeti, Aniket Thomas, Munish Monga, Sachin Kumar Giroh, Uddeshya Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our methodology underwent evaluation using renowned benchmark trajectory prediction datasets ETH Pellegrini et al. (2009a), UCY Lerner et al. (2007), and the PIE dataset Rasouli et al. (2019). The empirical findings highlight ASTRA s outperforming the latest state-of-the-art methodologies. Notably, our method showcased significant improvements of 27% on the deterministic and 10% on the stochastic settings of the ETH and UCY datasets and 26% on PIE.
Researcher Affiliation Academia Izzeddin Teeti EMAIL Visual Artificial Intelligence Laboratory (VAIL), Oxford Brookes University Aniket Thomas EMAIL Indian Institute of Technology Bombay Munish Monga EMAIL Indian Institute of Technology Bombay Sachin Kumar EMAIL Indian Institute of Technology Bombay Uddeshya Singh EMAIL Indian Institute of Technology Bombay Andrew Bradley EMAIL Oxford Brookes University Biplab Banerjee EMAIL Center of Machine Intelligence & Data Science, Indian Institute of Technology Bombay Fabio Cuzzolin EMAIL Visual Artificial Intelligence Laboratory (VAIL), Oxford Brookes University
Pseudocode No The paper describes the model architecture and components using textual descriptions and diagrams (e.g., Figure 2, Figure 3), but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include any links to a code repository.
Open Datasets Yes For a comprehensive evaluation, we benchmarked our model on three trajectory prediction datasets; namely, ETH Pellegrini et al. (2009a), UCY Lerner et al. (2007), and PIE dataset Rasouli et al. (2019).
Dataset Splits Yes ETH-UCY (Bird s Eye View) ETH and UCY offer a bird s-eye view of pedestrian dynamics in urban settings, including five datasets with 1,536 pedestrians across four scenes. For evaluation, we used their standard protocol; leave-one-out strategy, observing eight time steps (3.2s) and predicting the following 12 steps (4.8s). PIE (Ego-Vehicle View) ... A total of 1,842 pedestrian samples are considered with the following split: Training(50%), Validation(40%) and Testing(10%) Rasouli et al. (2019).
Hardware Specification Yes All experiments were conducted on an NVIDIA DGX A100 system with 8 GPUs, each equipped with 80 GB of memory.
Software Dependencies No The paper mentions "Adam W optimizer" and "cosine annealing scheduler" but does not specify any general software dependencies like Python, PyTorch, or CUDA with version numbers.
Experiment Setup Yes The key architectural hyperparameters used in our model are as follows: spatial embedding dimension (ΦSpatial R16), U-Net scene latent representation (ΨScene R16), temporal embedding dimension (ΦTemporal R8), and random walk embedding (ΦSocial R8). The transformer encoder consists of a single layer with two attention heads and a dropout rate of 0.2. For training, we employ the Adam W optimizer with a weight decay of 5 10 4 over 200 epochs. A cosine annealing scheduler is used, starting with an initial learning rate of 1 10 3. ... we found that α = 4 and β = 1, as the optimal values