ASTRA: A Scene-aware Transformer-based Model for Trajectory Prediction
Authors: Izzeddin Teeti, Aniket Thomas, Munish Monga, Sachin Kumar Giroh, Uddeshya Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our methodology underwent evaluation using renowned benchmark trajectory prediction datasets ETH Pellegrini et al. (2009a), UCY Lerner et al. (2007), and the PIE dataset Rasouli et al. (2019). The empirical findings highlight ASTRA s outperforming the latest state-of-the-art methodologies. Notably, our method showcased significant improvements of 27% on the deterministic and 10% on the stochastic settings of the ETH and UCY datasets and 26% on PIE. |
| Researcher Affiliation | Academia | Izzeddin Teeti EMAIL Visual Artificial Intelligence Laboratory (VAIL), Oxford Brookes University Aniket Thomas EMAIL Indian Institute of Technology Bombay Munish Monga EMAIL Indian Institute of Technology Bombay Sachin Kumar EMAIL Indian Institute of Technology Bombay Uddeshya Singh EMAIL Indian Institute of Technology Bombay Andrew Bradley EMAIL Oxford Brookes University Biplab Banerjee EMAIL Center of Machine Intelligence & Data Science, Indian Institute of Technology Bombay Fabio Cuzzolin EMAIL Visual Artificial Intelligence Laboratory (VAIL), Oxford Brookes University |
| Pseudocode | No | The paper describes the model architecture and components using textual descriptions and diagrams (e.g., Figure 2, Figure 3), but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include any links to a code repository. |
| Open Datasets | Yes | For a comprehensive evaluation, we benchmarked our model on three trajectory prediction datasets; namely, ETH Pellegrini et al. (2009a), UCY Lerner et al. (2007), and PIE dataset Rasouli et al. (2019). |
| Dataset Splits | Yes | ETH-UCY (Bird s Eye View) ETH and UCY offer a bird s-eye view of pedestrian dynamics in urban settings, including five datasets with 1,536 pedestrians across four scenes. For evaluation, we used their standard protocol; leave-one-out strategy, observing eight time steps (3.2s) and predicting the following 12 steps (4.8s). PIE (Ego-Vehicle View) ... A total of 1,842 pedestrian samples are considered with the following split: Training(50%), Validation(40%) and Testing(10%) Rasouli et al. (2019). |
| Hardware Specification | Yes | All experiments were conducted on an NVIDIA DGX A100 system with 8 GPUs, each equipped with 80 GB of memory. |
| Software Dependencies | No | The paper mentions "Adam W optimizer" and "cosine annealing scheduler" but does not specify any general software dependencies like Python, PyTorch, or CUDA with version numbers. |
| Experiment Setup | Yes | The key architectural hyperparameters used in our model are as follows: spatial embedding dimension (ΦSpatial R16), U-Net scene latent representation (ΨScene R16), temporal embedding dimension (ΦTemporal R8), and random walk embedding (ΦSocial R8). The transformer encoder consists of a single layer with two attention heads and a dropout rate of 0.2. For training, we employ the Adam W optimizer with a weight decay of 5 10 4 over 200 epochs. A cosine annealing scheduler is used, starting with an initial learning rate of 1 10 3. ... we found that α = 4 and β = 1, as the optimal values |