reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Heterogeneous-Agent Reinforcement Learning

Authors: Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We comprehensively test HARL algorithms on six challenging benchmarks and demonstrate their superior eﬀectiveness and stability for coordinating heterogeneous agents compared to strong baselines such as MAPPO and QMIX.1
Researcher Affiliation	Academia	Yifan Zhong1,2, EMAIL Jakub Grudzien Kuba3, EMAIL Xidong Feng4, EMAIL Siyi Hu5 EMAIL Jiaming Ji1 EMAIL Yaodong Yang1, EMAIL 1 Institute for Artiﬁcial Intelligence, Peking University 2 Beijing Institute for General Artiﬁcial Intelligence 3 University of Oxford 4 University College London 5 Re LER, AAII, University of Technology Sydney
Pseudocode	Yes	We propose the following Algorithm 1. Algorithm 1: Multi-Agent Policy Iteration with Monotonic Improvement Guarantee ... Algorithm Template 2: Heterogeneous-Agent Mirror Learning ... Algorithm 3: HATRPO ... Algorithm 4: HAPPO ... Algorithm 5: HAA2C ... Algorithm 6: HADDPG ... Algorithm 7: HATD3 ... Algorithm 8: HAD3QN
Open Source Code	Yes	1. Our code is available at https://github.com/PKU-MARL/HARL.
Open Datasets	Yes	To facilitate the usage of HARL algorithms, we open-source our Py Torch-based integrated implementation. Based on this, we test HARL algorithms comprehensively on Multi-Agent Particle Environment (MPE) (Lowe et al., 2017; Mordatch and Abbeel, 2018), Multi-Agent Mu Jo Co (MAMu Jo Co) (Peng et al., 2021), Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019), SMACv2 (Ellis et al., 2022), Google Research Football Environment (GRF) (Kurach et al., 2020), and Bi-Dexterous Hands (Chen et al., 2022).
Dataset Splits	No	The paper does not explicitly state specific train/test/validation dataset splits (e.g., percentages or sample counts). It refers to standard benchmark environments and tasks where splits are typically inherent to the environment's design or implied by common practice in the research community. For example, it mentions
Hardware Specification	Yes	The machine for experiments in this subsection is equipped with an AMD Ryzen 9 5950X 16-Core Processor and an NVIDIA RTX 3090 Ti GPU, and we ensure that no other experiments are running.
Software Dependencies	No	To facilitate the usage of HARL algorithms, we open-source our Py Torch-based integrated implementation. (Only 'PyTorch' is mentioned, without a specific version number.)
Experiment Setup	Yes	Details of hyper-parameters and experiment setups can be found in Appendix K. ... In this part, we present the common hyperparameters used for on-policy algorithms in Table 4 and for oﬀ-policy algorithms in Table 5 across all environments.