Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
Authors: Mikael Henaff, Alfredo Canziani, Yann LeCun
ICLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction. |
| Researcher Affiliation | Collaboration | Mikael Henaff Courant Institute, New York University Microsoft Research, NYC EMAIL Alfredo Canziani Courant Institute, New York University EMAIL Yann Le Cun Courant Institute, New York University Facebook AI Research EMAIL |
| Pseudocode | No | The paper describes algorithms and training steps in prose and uses diagrams (Figure 2, 3, 10), but no formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Code and additional video results for the model predictions and learned policies can be found at the following URL: https://sites.google.com/view/model-predictive-driving/home. |
| Open Datasets | Yes | The Next Generation Simulation program s Interstate 80 (NGSIM I-80) dataset (Halkias & Colyar, 2006) consists of 45 minutes of recordings from traffic cameras mounted over a stretch of highway. |
| Dataset Splits | Yes | This yields a total 5596 car trajectories, which we split into training (80%), validation (10%) and testing sets (10%). |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU model numbers, memory) were found in the paper. |
| Software Dependencies | No | The paper mentions 'Open AI Gym (Brockman et al., 2016)', 'Adam (Kingma & Ba, 2014)', 'Proximal Policy Optimization (PPO) (Schulman et al., 2017)', and 'Open AI Baselines'. It does not provide specific version numbers for any of these, nor for Python or PyTorch. |
| Experiment Setup | Yes | Our model was trained using Adam (Kingma & Ba, 2014) with learning rate 0.0001 and minibatches of size 64, unrolled for 20 time steps, and with dropout (pdropout = 0.1) at every layer, which was necessary for computing the epistemic uncertainty cost when training the policy network. |