OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

Authors: Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, Jakob Foerster

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we investigate the origins of ZSC challenges in Overcooked. We introduce a state-augmentation mechanism that mixes states that might be encountered when paired with unknown partners into the training distribution, reducing the out-of-distribution challenge associated with ZSC. Our results show that ZSC failures can largely be attributed to poor state-coverage rather than more sophisticated coordination challenges. The Overcooked environment is therefore not suitable as a ZSC benchmark. To address these shortcomings, we introduce Overcooked V21, a new version of the benchmark, which includes asymmetric information and stochasticity, facilitating the creation of interesting ZSC scenarios. To validate Overcooked V2, we demonstrate that mere exhaustive state coverage is insufficient to coordinate well. Finally, we use Overcooked V2 to build a new range of coordination challenges, including ones that require test-time protocol formation, and we demonstrate the need for new coordination algorithms that can adapt online.
Researcher Affiliation Academia Tobias Gessler Tin Dizdarevic Anisoara Calinescu Benjamin Ellis Andrei Lupu Jakob N. Foerster FLAIR, University of Oxford
Pseudocode Yes Algorithm 1 State-Augmented Self-Play Algorithm
Open Source Code Yes 1Available in Jax MARL: https://github.com/FLAIROx/Jax MARL Experiment code is available at https://github.com/overcookedv2/experiments.
Open Datasets Yes The Overcooked benchmark, introduced by Carroll et al. (2020), is based on the popular video game Overcooked. We introduce a novel environment, Overcooked V2, that requires agents to coordinate for high returns. The environment is implemented as part of the popular Jax MARL framework (Rutherford et al., 2023).
Dataset Splits No The paper describes training agents using self-play and evaluating their performance in cross-play over a specified number of episodes (e.g., "500 episodes" for evaluation and "10 independent agent pairs" for training), but does not provide traditional fixed dataset splits (e.g., percentages or counts for training, validation, and test sets of a static dataset). The data is generated through interaction with the environment.
Hardware Specification Yes Our experiments were conducted on a server equipped with 8 NVIDIA A40 GPUs with 48GB of memory and an AMD EPYC 7513 32-Core Processor.
Software Dependencies Yes The models were trained using JAX (Bradbury et al., 2018) and FLAX (Heek et al., 2023).
Experiment Setup Yes The same hyperparameters are used for both the standard and stateaugmented settings; an overview is provided in Appendix 3. Appendix D provides the hyperparameters used in our experiments. (e.g., Table 3: Hyperparameters for the layouts: Cramped Room, Asymmetric advantages, Coordination Ring, Forced Coordination and Counter Circuit.)