Self-Explaining Deviations for Coordination
Authors: Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob Foerster
NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we evaluate IMPROVISED both in an illustrative toy setting and the popular benchmark setting Hanabi, where we show that it can produce so called finesse plays. We test the IMPROVISED in two different settings. The first setting is the trampoline-tiger game explained before. Secondly, we apply IMPROVISED to three-player Hanabi, where we start from a blueprint trained on human data. |
| Researcher Affiliation | Collaboration | Hengyuan Hu Stanford University EMAIL Samuel Sokota Carnegie Mellon University EMAIL Meta AI EMAIL Anton Bakhtin Meta AI EMAIL Andrei Lupu Meta AI & FLAIR, University of Oxford EMAIL Brandon Cui Mosaic ML EMAIL Jakob N. Foerster FLAIR, University of Oxford EMAIL |
| Pseudocode | Yes | Please refer to the Appendix A for the detailed pseudocode. |
| Open Source Code | Yes | We provide the code for our Hanabi experiments at https://github.com/facebookresearch/off-belief-learning/blob/main/ pyhanabi/finesse.py. |
| Open Datasets | Yes | Lastly, we present experiments on the large scale benchmark Hanabi [1], where we show that IMPROVISED is able to produce finesse plays, which is one of the most interesting techniques that human experts perform frequently. To implement IMPROVISED in Hanabi, we first need a belief function from which we can sample game states given either public or private knowledge of the game to perform Monte Carlo rollouts. Luckily, the belief over possible hands in Hanabi can be computed analytically [8]. We use a blueprint policy to generate selfplay games over a range of decks (game seeds) |
| Dataset Splits | No | The paper describes how specific experimental situations (finesse-able and finesse-complete) are generated for evaluation, but it does not provide explicit training, validation, or test dataset splits with percentages, counts, or specific pre-defined split methodologies for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing infrastructure) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'pyhanabi' for its Hanabi experiments and refers to various prior works for agents (e.g., MAPPO, QMIX, SAD, Other-Play, OBL), but it does not list specific version numbers for any key software components or libraries used in its own experimental setup. |
| Experiment Setup | Yes | The detailed hyper-parameters and computational cost are in Section C. |