Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Authors: Kunal Jha, Wilka Carvalho, Yancheng Liang, Simon Shaolei Du, Max Kleiman-Weiner, Natasha Jaques

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive simulated and human experiments to evaluate the performance of CEC agents against state-of-the-art (SOTA) baselines. Our human study reveals that CEC agents outperform PBT on performance and outperform all methods on subjective measures of cooperation.
Researcher Affiliation Academia 1Department of Computer Science, University of Washington, Seattle, WA 2Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, Massachusetts.
Pseudocode Yes Algorithm 1 Solvable Overcooked Coordination Challenge Generation
Open Source Code Yes Code for environment, training, and testing scripts and more can be found at https://kjha02.github.io/ publication/cross-env-coop. Our human-AI experiments and surveys utilized the Nice Web RL Python package (https://github.com/wcarvalho/nicewebrl), which leverages Jax’s parallelizability to efficiently crowd-source participant data on reinforcement learning environments.
Open Datasets No The paper describes how they procedurally generate environments for their experiments, such as 'Our procedural generator creates new coordination challenges in Overcooked, shown in Figure 5.' It does not provide access information (link, DOI, or citation) for a fixed, pre-existing dataset.
Dataset Splits Yes Note that we hold out those five layouts from the CEC generator, so that when we evaluate CEC on these layouts we are able to test generalization across both partners and tasks. Second, we introduce an additional evaluation setting where we have the Overcooked procedural environment generator create 100 coordination challenges that neither the ST baselines nor any of the CEC agents have seen during training and assess how well the different approaches can generalize to both novel partners and novel environments.
Hardware Specification No Jax allows us to run the entire training and evaluation pipeline, from the environment generation to the neural network updating of agents, at 10 million steps per minute on a single GPU. No specific GPU model or other hardware details are provided.
Software Dependencies No The paper mentions 'Jax-based' environments and the 'Nice Web RL Python package (https://github.com/wcarvalho/nicewebrl)', but does not provide specific version numbers for Jax, Python, or the Nice Web RL package.
Experiment Setup Yes We leverage this speed to train CEC agents, and all other baselines, for 3 billion steps. For each copy of the CEC agent, we perform an additional 100 million steps of training on a single layout with a reduced learning rate, again in self-play using IPPO. We train six seeds for each type of agent. We use the parameters in Table 2 for training all PPO agents: LR 3e-4, NUM_STEPS 256, TOTAL_TIMESTEPS 3e9, UPDATE_EPOCHS 4, NUM_MINIBATCHES 2, GAMMA 0.99, GAE_LAMBDA 0.95, CLIP_EPS 0.2, ENT_COEF 0.005, VF_COEF 1.0, MAX_GRAD_NORM 0.5, ANNEAL_LR True.