Interactive Adjustment for Human Trajectory Prediction with Individual Feedback
Authors: Jianhua Sun, Yuxuan Li, Liang Chai, Cewu Lu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on representative prediction methods and widely-used benchmarks, we demonstrate the great value of individual feedback and the superior effectiveness of proposed interactive adjustment network. We conduct exhaustive experiments on three widely-used trajectory prediction benchmarks (Pellegrini et al., 2009; Leal-Taix e et al., 2014; Zhou et al., 2012; linouk23, 2016) with 6 representative prediction models (Gupta et al., 2018; Shi et al., 2021; Pang et al., 2021; Xu et al., 2022a; Shi et al., 2023; Bae et al., 2024), including state-of-the-art (Bae et al., 2024). The results demonstrate the great value of individual feedback, the superior effectiveness of IAN and the significant performance boost on trajectory prediction. |
| Researcher Affiliation | Academia | Jianhua Sun , Yuxuan Li , Liang Chai, Cewu Lu Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1 Candidate Filtering Algorithm 2 IAN Training Algorithm 3 IAN Inference |
| Open Source Code | No | The paper references third-party implementations and official models for the base prediction models used in their experiments, such as 'We use Group Net on CVAE with their official implementation (sjtuxcx, 2022)' and 'We use official models for testing if available, otherwise we train models according to the official implementations.' However, there is no explicit statement or link indicating that the authors have released the source code for their proposed Interactive Adjustment Network (IAN). |
| Open Datasets | Yes | We conduct experiments on the following three widely-used benchmarks. ETH (Pellegrini et al., 2009)/UCY (Leal-Taix e et al., 2014) Dataset is one of the most commonly used benchmarks. Grand Central Station Dataset (GCS) (Zhou et al., 2012) contains trajectories extracted from a 30-min video recorded at the Grand Central Station. NBA Sports VU Dataset (NBA) (linouk23, 2016) contains trajectories of all ten players in real NBA games. |
| Dataset Splits | Yes | ETH (Pellegrini et al., 2009)/UCY (Leal-Taix e et al., 2014) Dataset... We follow Alahi et al. (2016) for the leave-one-out evaluation and observation/prediction horizon. Grand Central Station Dataset (GCS) (Zhou et al., 2012)... We split the first 80% of the dataset for training, and the rest 20% for test. NBA Sports VU Dataset (NBA) (linouk23, 2016)... We select 50k samples in total from the 2015-2016 season with a split of 65%, 10%, 25% as training, validation and testing data following Li et al. (2020). To tackle this problem, we draw on the idea of K-fold cross validation. Specifically, the training set of P is first split into K folds... We use K = 5 in our experiments. |
| Hardware Specification | Yes | Under our test environments with a single RTX3090, IAN takes an average of 0.02 seconds to produce the adjusted predictions for an agent. |
| Software Dependencies | No | The paper mentions using 'LSTM networks' for encoders and 'Adam optimizer' for training, but it does not specify any version numbers for these or other software libraries (e.g., PyTorch, TensorFlow) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | In our implementation, we use LSTM networks for all the encoders in IAN, and the output sizes of these encoders are all 64. Each proposal generated by the adjuster has a dimension of 32. The adjuster is a four-layer mlp with embedding sizes of (128, 256, 512, 640, 640) and individual feedback F serves as the biases of its last two layers for low computational cost and a good convergence. Accordingly, the mlp used during feedback aggregation has 3 layers and an output size of 1280. The confidence network first encodes the input trajectory into deep feature, then concatenate it with the proposal. The concatenated feature is fed to a triple-layer mlp with input size of 96 and outputs a single number as the confidence score. We use η = N = 200 for all prediction models except TUTR, where we set η = N = L (notation L indicates the number of general motion modes in the TUTR paper) for ETH/UCY, and use η = N = 80 for the other datasets. During collection of the training set, the original training set is split into K = 5 folds. The network is trained for 30 epochs using Adam optimizer. |