On Centralized Critics in Multi-Agent Reinforcement Learning
Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Christopher Amato
JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature. ... 7. Empirical Findings and Discussions In this section, we present experimental results comparing different types of critics. We test on a variety of popular research domains including, but not limited to, classical matrix games, the Star Craft Multi-Agent Challenge (SMAC) (Samvelyan, Rashid, de Witt, Farquhar, Nardelli, Rudner, Hung, Torr, Foerster, & Whiteson, 2019), the Multi-agent Particle Environments (MPE) (Mordatch & Abbeel, 2018), and the MARL Environments Compilation (Jiang, 2019). |
| Researcher Affiliation | Academia | Xueguang Lyu EMAIL Andrea Baisero EMAIL Yuchen Xiao EMAIL Brett Daley EMAIL Christopher Amato EMAIL Northeastern University, Khoury College of Computer Sciences, 360 Huntington Avenue, Boston, MA 02115 USA |
| Pseudocode | Yes | Appendix D. Pseudocode Algorithm 1 IAC Algorithm 2 IACC-H Algorithm 3 IACC-S Algorithm 4 IACC-HS |
| Open Source Code | Yes | We provide open-source implementations1 used for our experiments. 1. https://github.com/lyu-xg/on-centralized-critics-in-marl |
| Open Datasets | Yes | We test on a variety of popular research domains including, but not limited to, classical matrix games, the Star Craft Multi-Agent Challenge (SMAC) (Samvelyan, Rashid, de Witt, Farquhar, Nardelli, Rudner, Hung, Torr, Foerster, & Whiteson, 2019), the Multi-agent Particle Environments (MPE) (Mordatch & Abbeel, 2018), and the MARL Environments Compilation (Jiang, 2019). ... We test on the classic yet challenging domain Dec-Tiger (Nair et al., 2003). Recall that in Dec-Tiger... We use the Climb Game (Claus & Boutilier, 1998)... Morning Game Example The Morning Game shown in Table 3 is a matrix game inspired by previous work (Peshkin et al., 2000)... In Capture Target (Lyu & Amato, 2020)... Move Box (Jiang, 2019)... multi-agent recycling task... Meeting-in-a-Grid domains (Bernstein, Hansen, & Zilberstein, 2005; Amato, Dibangoye, & Zilberstein, 2009) |
| Dataset Splits | No | The paper mentions using well-known benchmarks and classical games, but it does not specify any training, testing, or validation splits (e.g., percentages or sample counts) for reproducibility. It discusses aggregating results over multiple runs, but not data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used to conduct the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/solvers). |
| Experiment Setup | No | The paper discusses various critic methods and their theoretical and empirical performance but does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs, optimizer settings) or other detailed system-level training configurations in the main text. |