MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

Authors: Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). This appendix introduces some of the key evaluation results of MALib, and more results can be found on our project website (see issue #35). The evaluation focuses on both system and algorithm performance, including the comparison of data throughput, training efficiency, and algorithms convergence performance.
Researcher Affiliation Academia 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Institute for Artificial Intelligence, Peking University 3 Department of Computer Science, University College London corresponding authors
Pseudocode No The paper describes the framework components and their interactions (e.g., Coordinator dispatching tasks to Actors and Learners, Figure 2), but it does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib).
Open Datasets Yes As the environment for throughput comparison, we adopt the multi-agent version of Atari games (MA-Atari) from Petting Zoo (Terry et al., 2020), a collection of 2D video games with multiple agents. We compared MALib with Open Spiel(Lanctot et al., 2019) on solving Leduc Poker, a common benchmark in Poker AI. Multi-agent Particle Environments (MPE) (Lowe et al., 2017) is a typical benchmark environment for the research of MARL.
Dataset Splits No The paper mentions using specific environments like MA-Atari, Leduc Poker, and MPE, and describes some experimental parameters like running 2,000 simulations for Leduc Poker. However, it does not provide explicit details on dataset splits (e.g., train/test/validation percentages or counts) for any of these environments.
Hardware Specification Yes All the experiment results are obtained with one of the following hardware settings: System #1: a 32-core computing node with dual graphics cards; System #2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).
Software Dependencies No The development of MALib is based on Python, Ray (Moritz et al., 2018) and Py Torch (Paszke et al., 2019).
Experiment Setup Yes For each worker, we fixed the number of environments as 100. The number of workers ranges from 1 to 128 to compare the upper bound and bottleneck in the parallelism performance of different frameworks. To get a relatively accurate empirical payoff, we run 2,000 simulations for each policy combination, and the maximum of population size is limited to 100. Specifically, it is constructed as a Conv Net with three convolutional layers, and two fully-connected heads for the actor and critic.