Ad-Hoc Human-AI Coordination Challenge

Authors: Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Jakob Nicolaus Foerster

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop human proxy agents on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we opensource a dataset of 3,079 games... We present baseline results for both two- and three-player Hanabi scenarios... Our empirical evaluation shows that these human proxy agents outperform pure imitation learning while maintaining human-like behaviour.
Researcher Affiliation Academia 1FLAIR, University of Oxford, Oxford, UK 2Department of Computer Science, University of Oxford, UK 3Universitat Polit ecnica de Catalunya, Barcelona, Spain.
Pseudocode No The paper describes methods like HDR-IPPO, IPPO, and BC using mathematical formulations and textual explanations, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/FLAIROx/ah2ac2. We provide our code in an anonymous repository: https://anonymous.4open.science/r/ah2ac2-2FDA. At the time of submission, the open-sourced codebase includes: ... Code for training the baselines.
Open Datasets Yes To encourage the development of data-efficient methods, we opensource a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. The first open-source Hanabi human gameplay dataset, containing 1,858 two-player and 1,221 three-player games.
Dataset Splits Yes Specifically, for the 2p setting, we select 858 games for validation and 858 for testing; for the 3p setting, we allocate 221 games for validation and 221 for testing.
Hardware Specification No The paper does not provide specific GPU/CPU models, processor types, memory details, or any other explicit hardware specifications used for running its experiments.
Software Dependencies No The paper mentions software components and methods such as the Adam optimiser, PPO, and GRU, but it does not provide specific version numbers for any key software libraries, frameworks, or environments used for replication.
Experiment Setup Yes Table 9. Human proxy agent training configurations and architectures. We showcase both BC and IPPO hyperparameters in a single table. Table 12. Hyperparameters used for training agents in the ablation study. Table 15. Hyperparameters used for training BC, HDR-IPPO baselines on a 1,000-game data limit challenge. Table 17. Hyperparameters used for training all IPPO and BR-BC baseline agents.