Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Heterogeneous-Agent Reinforcement Learning
Authors: Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We comprehensively test HARL algorithms on six challenging benchmarks and demonstrate their superior effectiveness and stability for coordinating heterogeneous agents compared to strong baselines such as MAPPO and QMIX.1 |
| Researcher Affiliation | Academia | Yifan Zhong1,2, EMAIL Jakub Grudzien Kuba3, EMAIL Xidong Feng4, EMAIL Siyi Hu5 EMAIL Jiaming Ji1 EMAIL Yaodong Yang1, EMAIL 1 Institute for Artificial Intelligence, Peking University 2 Beijing Institute for General Artificial Intelligence 3 University of Oxford 4 University College London 5 Re LER, AAII, University of Technology Sydney |
| Pseudocode | Yes | We propose the following Algorithm 1. Algorithm 1: Multi-Agent Policy Iteration with Monotonic Improvement Guarantee ... Algorithm Template 2: Heterogeneous-Agent Mirror Learning ... Algorithm 3: HATRPO ... Algorithm 4: HAPPO ... Algorithm 5: HAA2C ... Algorithm 6: HADDPG ... Algorithm 7: HATD3 ... Algorithm 8: HAD3QN |
| Open Source Code | Yes | 1. Our code is available at https://github.com/PKU-MARL/HARL. |
| Open Datasets | Yes | To facilitate the usage of HARL algorithms, we open-source our Py Torch-based integrated implementation. Based on this, we test HARL algorithms comprehensively on Multi-Agent Particle Environment (MPE) (Lowe et al., 2017; Mordatch and Abbeel, 2018), Multi-Agent Mu Jo Co (MAMu Jo Co) (Peng et al., 2021), Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022), Google Research Football Environment (GRF) (Kurach et al., 2020), and Bi-Dexterous Hands (Chen et al., 2022). |
| Dataset Splits | No | The paper does not explicitly state specific train/test/validation dataset splits (e.g., percentages or sample counts). It refers to standard benchmark environments and tasks where splits are typically inherent to the environment's design or implied by common practice in the research community. For example, it mentions |
| Hardware Specification | Yes | The machine for experiments in this subsection is equipped with an AMD Ryzen 9 5950X 16-Core Processor and an NVIDIA RTX 3090 Ti GPU, and we ensure that no other experiments are running. |
| Software Dependencies | No | To facilitate the usage of HARL algorithms, we open-source our Py Torch-based integrated implementation. (Only 'PyTorch' is mentioned, without a specific version number.) |
| Experiment Setup | Yes | Details of hyper-parameters and experiment setups can be found in Appendix K. ... In this part, we present the common hyperparameters used for on-policy algorithms in Table 4 and for off-policy algorithms in Table 5 across all environments. |