MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning
Authors: Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). This appendix introduces some of the key evaluation results of MALib, and more results can be found on our project website (see issue #35). The evaluation focuses on both system and algorithm performance, including the comparison of data throughput, training efficiency, and algorithms convergence performance. |
| Researcher Affiliation | Academia | 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Institute for Artificial Intelligence, Peking University 3 Department of Computer Science, University College London corresponding authors |
| Pseudocode | No | The paper describes the framework components and their interactions (e.g., Coordinator dispatching tasks to Actors and Learners, Figure 2), but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). |
| Open Datasets | Yes | As the environment for throughput comparison, we adopt the multi-agent version of Atari games (MA-Atari) from Petting Zoo (Terry et al., 2020), a collection of 2D video games with multiple agents. We compared MALib with Open Spiel(Lanctot et al., 2019) on solving Leduc Poker, a common benchmark in Poker AI. Multi-agent Particle Environments (MPE) (Lowe et al., 2017) is a typical benchmark environment for the research of MARL. |
| Dataset Splits | No | The paper mentions using specific environments like MA-Atari, Leduc Poker, and MPE, and describes some experimental parameters like running 2,000 simulations for Leduc Poker. However, it does not provide explicit details on dataset splits (e.g., train/test/validation percentages or counts) for any of these environments. |
| Hardware Specification | Yes | All the experiment results are obtained with one of the following hardware settings: System #1: a 32-core computing node with dual graphics cards; System #2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090). |
| Software Dependencies | No | The development of MALib is based on Python, Ray (Moritz et al., 2018) and Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | For each worker, we fixed the number of environments as 100. The number of workers ranges from 1 to 128 to compare the upper bound and bottleneck in the parallelism performance of different frameworks. To get a relatively accurate empirical payoff, we run 2,000 simulations for each policy combination, and the maximum of population size is limited to 100. Specifically, it is constructed as a Conv Net with three convolutional layers, and two fully-connected heads for the actor and critic. |