reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Centralized Critics in Multi-Agent Reinforcement Learning

Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Christopher Amato

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature. ... 7. Empirical Findings and Discussions In this section, we present experimental results comparing different types of critics. We test on a variety of popular research domains including, but not limited to, classical matrix games, the Star Craft Multi-Agent Challenge (SMAC) (Samvelyan, Rashid, de Witt, Farquhar, Nardelli, Rudner, Hung, Torr, Foerster, & Whiteson, 2019), the Multi-agent Particle Environments (MPE) (Mordatch & Abbeel, 2018), and the MARL Environments Compilation (Jiang, 2019).
Researcher Affiliation	Academia	Xueguang Lyu EMAIL Andrea Baisero EMAIL Yuchen Xiao EMAIL Brett Daley EMAIL Christopher Amato EMAIL Northeastern University, Khoury College of Computer Sciences, 360 Huntington Avenue, Boston, MA 02115 USA
Pseudocode	Yes	Appendix D. Pseudocode Algorithm 1 IAC Algorithm 2 IACC-H Algorithm 3 IACC-S Algorithm 4 IACC-HS
Open Source Code	Yes	We provide open-source implementations1 used for our experiments. 1. https://github.com/lyu-xg/on-centralized-critics-in-marl
Open Datasets	Yes	We test on a variety of popular research domains including, but not limited to, classical matrix games, the Star Craft Multi-Agent Challenge (SMAC) (Samvelyan, Rashid, de Witt, Farquhar, Nardelli, Rudner, Hung, Torr, Foerster, & Whiteson, 2019), the Multi-agent Particle Environments (MPE) (Mordatch & Abbeel, 2018), and the MARL Environments Compilation (Jiang, 2019). ... We test on the classic yet challenging domain Dec-Tiger (Nair et al., 2003). Recall that in Dec-Tiger... We use the Climb Game (Claus & Boutilier, 1998)... Morning Game Example The Morning Game shown in Table 3 is a matrix game inspired by previous work (Peshkin et al., 2000)... In Capture Target (Lyu & Amato, 2020)... Move Box (Jiang, 2019)... multi-agent recycling task... Meeting-in-a-Grid domains (Bernstein, Hansen, & Zilberstein, 2005; Amato, Dibangoye, & Zilberstein, 2009)
Dataset Splits	No	The paper mentions using well-known benchmarks and classical games, but it does not specify any training, testing, or validation splits (e.g., percentages or sample counts) for reproducibility. It discusses aggregating results over multiple runs, but not data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used to conduct the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries/solvers).
Experiment Setup	No	The paper discusses various critic methods and their theoretical and empirical performance but does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs, optimizer settings) or other detailed system-level training configurations in the main text.