Robust Autonomy Emerges from Self-Play
Authors: Marco Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor W. Killian, Stuart Bowers, Ozan Sener, Philipp Kraehenbuehl, Vladlen Koltun
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training. |
| Researcher Affiliation | Industry | 1Apple. Correspondence to: Philipp Kr ahenb uhl <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Advantage filtering |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology or a direct link to a code repository. It refers to using third-party codebases like Stable Baselines. |
| Open Datasets | Yes | We test the GIGAFLOW policy in three leading independent third-party benchmarks: CARLA (Dosovitskiy et al., 2017), nu Plan (Caesar et al., 2022), and the Waymo Open Motion Dataset (Ettinger et al., 2021) (through the Waymax simulator (Gulino et al., 2023)). |
| Dataset Splits | Yes | The nu Plan benchmark consists of a training, validation and held out test set... We evaluate GIGAFLOW on the Val14 benchmark... For the full Waymo Open Motion Dataset (WOMD) 1.2.0 validation set consisting of 44 097 scenarios each 8 s long running at 10 Hz. |
| Hardware Specification | Yes | GIGAFLOW is capable of simulating and learning from 4.4 billion state transitions (7.2 million km of driving, or 42 years of continuous driving experience) per hour on a single 8-GPU node... On an 8-GPU A100 node, the policy allows inference throughput of 7.4 million decisions per second during experience collection at a batch size of 2.6 million, and eight gradient updates per second in the training phase with a batch size of 256 000. |
| Software Dependencies | No | GIGAFLOW is a batched simulator (Makoviychuk et al., 2021; Freeman et al., 2021; Petrenko et al., 2021; Shacklett et al., 2021), implemented in Py Torch (Ansel et al., 2024)... Agents are trained using a version of Proximal Policy Optimization (PPO) (Schulman et al., 2017) derived from the Stable Baselines codebase (Raffin et al., 2021). While software is mentioned with citations, specific version numbers are not provided. |
| Experiment Setup | Yes | Table A3 provides our final list of training hyperparameters. Training batch size 256 000 Batch size per GPU 32 000 Rollout length 128 Num. PPO epochs 3 Discount factor γ 0.999 λGAE 0.95 Max. episode length 1200 steps (360 s) PPO clipping ratio 0.2 Value function clipping None Initial LR α(0) 5 10 4 LR schedule Cosine Entropy coefficient 0.01 Value loss coefficient 0.5 Max grad. norm 0.5 Advantage normalization Enabled Adv. filtering threshold η 0.01 Amax (Alg. 1) Inference & training precision 16-bit AMP Model weights initialization Orthogonal, zero bias |