Robust Autonomy Emerges from Self-Play

Authors: Marco Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor W. Killian, Stuart Bowers, Ozan Sener, Philipp Kraehenbuehl, Vladlen Koltun

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training.
Researcher Affiliation Industry 1Apple. Correspondence to: Philipp Kr ahenb uhl <EMAIL>.
Pseudocode Yes Algorithm 1 Advantage filtering
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology or a direct link to a code repository. It refers to using third-party codebases like Stable Baselines.
Open Datasets Yes We test the GIGAFLOW policy in three leading independent third-party benchmarks: CARLA (Dosovitskiy et al., 2017), nu Plan (Caesar et al., 2022), and the Waymo Open Motion Dataset (Ettinger et al., 2021) (through the Waymax simulator (Gulino et al., 2023)).
Dataset Splits Yes The nu Plan benchmark consists of a training, validation and held out test set... We evaluate GIGAFLOW on the Val14 benchmark... For the full Waymo Open Motion Dataset (WOMD) 1.2.0 validation set consisting of 44 097 scenarios each 8 s long running at 10 Hz.
Hardware Specification Yes GIGAFLOW is capable of simulating and learning from 4.4 billion state transitions (7.2 million km of driving, or 42 years of continuous driving experience) per hour on a single 8-GPU node... On an 8-GPU A100 node, the policy allows inference throughput of 7.4 million decisions per second during experience collection at a batch size of 2.6 million, and eight gradient updates per second in the training phase with a batch size of 256 000.
Software Dependencies No GIGAFLOW is a batched simulator (Makoviychuk et al., 2021; Freeman et al., 2021; Petrenko et al., 2021; Shacklett et al., 2021), implemented in Py Torch (Ansel et al., 2024)... Agents are trained using a version of Proximal Policy Optimization (PPO) (Schulman et al., 2017) derived from the Stable Baselines codebase (Raffin et al., 2021). While software is mentioned with citations, specific version numbers are not provided.
Experiment Setup Yes Table A3 provides our final list of training hyperparameters. Training batch size 256 000 Batch size per GPU 32 000 Rollout length 128 Num. PPO epochs 3 Discount factor γ 0.999 λGAE 0.95 Max. episode length 1200 steps (360 s) PPO clipping ratio 0.2 Value function clipping None Initial LR α(0) 5 10 4 LR schedule Cosine Entropy coefficient 0.01 Value loss coefficient 0.5 Max grad. norm 0.5 Advantage normalization Enabled Adv. filtering threshold η 0.01 Amax (Alg. 1) Inference & training precision 16-bit AMP Model weights initialization Orthogonal, zero bias