Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Heterogeneous-Agent Reinforcement Learning

Authors: Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively test HARL algorithms on six challenging benchmarks and demonstrate their superior effectiveness and stability for coordinating heterogeneous agents compared to strong baselines such as MAPPO and QMIX.1
Researcher Affiliation Academia Yifan Zhong1,2, EMAIL Jakub Grudzien Kuba3, EMAIL Xidong Feng4, EMAIL Siyi Hu5 EMAIL Jiaming Ji1 EMAIL Yaodong Yang1, EMAIL 1 Institute for Artificial Intelligence, Peking University 2 Beijing Institute for General Artificial Intelligence 3 University of Oxford 4 University College London 5 Re LER, AAII, University of Technology Sydney
Pseudocode Yes We propose the following Algorithm 1. Algorithm 1: Multi-Agent Policy Iteration with Monotonic Improvement Guarantee ... Algorithm Template 2: Heterogeneous-Agent Mirror Learning ... Algorithm 3: HATRPO ... Algorithm 4: HAPPO ... Algorithm 5: HAA2C ... Algorithm 6: HADDPG ... Algorithm 7: HATD3 ... Algorithm 8: HAD3QN
Open Source Code Yes 1. Our code is available at https://github.com/PKU-MARL/HARL.
Open Datasets Yes To facilitate the usage of HARL algorithms, we open-source our Py Torch-based integrated implementation. Based on this, we test HARL algorithms comprehensively on Multi-Agent Particle Environment (MPE) (Lowe et al., 2017; Mordatch and Abbeel, 2018), Multi-Agent Mu Jo Co (MAMu Jo Co) (Peng et al., 2021), Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022), Google Research Football Environment (GRF) (Kurach et al., 2020), and Bi-Dexterous Hands (Chen et al., 2022).
Dataset Splits No The paper does not explicitly state specific train/test/validation dataset splits (e.g., percentages or sample counts). It refers to standard benchmark environments and tasks where splits are typically inherent to the environment's design or implied by common practice in the research community. For example, it mentions
Hardware Specification Yes The machine for experiments in this subsection is equipped with an AMD Ryzen 9 5950X 16-Core Processor and an NVIDIA RTX 3090 Ti GPU, and we ensure that no other experiments are running.
Software Dependencies No To facilitate the usage of HARL algorithms, we open-source our Py Torch-based integrated implementation. (Only 'PyTorch' is mentioned, without a specific version number.)
Experiment Setup Yes Details of hyper-parameters and experiment setups can be found in Appendix K. ... In this part, we present the common hyperparameters used for on-policy algorithms in Table 4 and for off-policy algorithms in Table 5 across all environments.