OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
Authors: Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, Jakob Foerster
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we investigate the origins of ZSC challenges in Overcooked. We introduce a state-augmentation mechanism that mixes states that might be encountered when paired with unknown partners into the training distribution, reducing the out-of-distribution challenge associated with ZSC. Our results show that ZSC failures can largely be attributed to poor state-coverage rather than more sophisticated coordination challenges. The Overcooked environment is therefore not suitable as a ZSC benchmark. To address these shortcomings, we introduce Overcooked V21, a new version of the benchmark, which includes asymmetric information and stochasticity, facilitating the creation of interesting ZSC scenarios. To validate Overcooked V2, we demonstrate that mere exhaustive state coverage is insufficient to coordinate well. Finally, we use Overcooked V2 to build a new range of coordination challenges, including ones that require test-time protocol formation, and we demonstrate the need for new coordination algorithms that can adapt online. |
| Researcher Affiliation | Academia | Tobias Gessler Tin Dizdarevic Anisoara Calinescu Benjamin Ellis Andrei Lupu Jakob N. Foerster FLAIR, University of Oxford |
| Pseudocode | Yes | Algorithm 1 State-Augmented Self-Play Algorithm |
| Open Source Code | Yes | 1Available in Jax MARL: https://github.com/FLAIROx/Jax MARL Experiment code is available at https://github.com/overcookedv2/experiments. |
| Open Datasets | Yes | The Overcooked benchmark, introduced by Carroll et al. (2020), is based on the popular video game Overcooked. We introduce a novel environment, Overcooked V2, that requires agents to coordinate for high returns. The environment is implemented as part of the popular Jax MARL framework (Rutherford et al., 2023). |
| Dataset Splits | No | The paper describes training agents using self-play and evaluating their performance in cross-play over a specified number of episodes (e.g., "500 episodes" for evaluation and "10 independent agent pairs" for training), but does not provide traditional fixed dataset splits (e.g., percentages or counts for training, validation, and test sets of a static dataset). The data is generated through interaction with the environment. |
| Hardware Specification | Yes | Our experiments were conducted on a server equipped with 8 NVIDIA A40 GPUs with 48GB of memory and an AMD EPYC 7513 32-Core Processor. |
| Software Dependencies | Yes | The models were trained using JAX (Bradbury et al., 2018) and FLAX (Heek et al., 2023). |
| Experiment Setup | Yes | The same hyperparameters are used for both the standard and stateaugmented settings; an overview is provided in Appendix 3. Appendix D provides the hyperparameters used in our experiments. (e.g., Table 3: Hyperparameters for the layouts: Cramped Room, Asymmetric advantages, Coordination Ring, Forced Coordination and Counter Circuit.) |