System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning
Authors: Matteo Bettini, Ajay Shankar, Amanda Prorok
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through simulations of a variety of cooperative multi-robot tasks, we show how our metric constitutes an important tool that enables measurement and control of behavioral heterogeneity. |
| Researcher Affiliation | Academia | Matteo Bettini EMAIL Department of Computer Science and Technology, University of Cambridge, UK Ajay Shankar EMAIL Department of Computer Science and Technology, University of Cambridge, UK Amanda Prorok EMAIL Department of Computer Science and Technology, University of Cambridge, UK |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in narrative text and mathematical equations. |
| Open Source Code | Yes | The code for the experiments used in this paper is publicly available at https://github. com/proroklab/Het GPPO. |
| Open Datasets | Yes | The simulation environments representing these tasks are partly new implementations and partly adapted from existing environments in the VMAS benchmark set (Bettini et al., 2022). |
| Dataset Splits | No | The paper describes generating experience through 'rollouts' and 'episodes' (e.g., '600 episodes of experience', '300 (wind) episodes of experience') which are used for training, but it does not specify fixed train/test/validation splits of a static dataset. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | Training is performed in RLlib (Liang et al., 2018) using Py Torch (Paszke et al., 2019) and a multi-agent implementation of the PPO algorithm (Blumenkamp and Prorok, 2021). The paper mentions software components but does not provide their specific version numbers. |
| Experiment Setup | Yes | Table 4: Training parameters for all evaluations. Training PPO Batch size 60000 ϵ 0.2 Minibatch size 4096 γ 0.99 SDG Iterations 40 λ 0.9 # Workers 5 Entropy coeff 0 # Envs per worker 50 KL coeff 0.01 Learning rate 5e-5 KL target 0.01 |