Explaining Decisions of Agents in Mixed-Motive Games
Authors: Maayan Orner, Oleg Maksimov, Akiva Kleinerman, Charles Ortiz, Sarit Kraus
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the results of our evaluation experiments in Diplomacy and Risk. The experiments conducted in the COP game are summarized here and described in detail in the appendix. We conducted two complementary studies with humans in two different environments. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Bar-Ilan University, Israel 2SRI International, USA |
| Pseudocode | Yes | Algorithm 1: Simulate |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | No | The paper mentions using "no-press Diplomacy", "Communicate Out of Prison (COP) game", and a "simplified version of Risk". While Diplomacy is a known game, the specific 'game environment from (Paquette et al. 2019)' is cited as an external resource used, not a dataset created and shared by the authors. The COP game was designed by the authors, and the experimental data generated (e.g., "randomly generated 30 Diplomacy game states", "12 board states" for Risk, "simulated the game until it included some chat history") are not stated to be publicly available with access information. |
| Dataset Splits | No | The paper describes generating specific game states for human user studies (e.g., "randomly generated 30 Diplomacy game states", "generated 12 board states" for Risk). However, it does not provide specific train/test/validation dataset splits or methodologies typically used for model training and evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using GPT-4 for the COP game and neural policy networks for Diplomacy, but it does not specify versions for any programming languages, libraries (e.g., PyTorch, TensorFlow), or other software dependencies. |
| Experiment Setup | Yes | Explanation estimation: To explain action ai e (e denotes explained) given state s, the following steps are performed: 1. Simulate the next turn from s for k times, where agent i performs action ai e, and all other agents follow their respective policies. 2. Estimate the utility values of each outcome using the value functions and rewards (algorithm 1 line 13). ... We run k simulations from state st, where agent i performs action ai e and all other agents follow their respective policies. Then, we extract the most commonly used action of each agent accordingly. ... For the probable actions-based explanations, we examined how the temperature parameter affected the game outcomes. We found that using a temperature τ = 0, which corresponds to greedy decoding (our approach), sometimes led to outcomes that were not probable when using τ = 0.7. |