System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning

Authors: Matteo Bettini, Ajay Shankar, Amanda Prorok

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through simulations of a variety of cooperative multi-robot tasks, we show how our metric constitutes an important tool that enables measurement and control of behavioral heterogeneity.
Researcher Affiliation Academia Matteo Bettini EMAIL Department of Computer Science and Technology, University of Cambridge, UK Ajay Shankar EMAIL Department of Computer Science and Technology, University of Cambridge, UK Amanda Prorok EMAIL Department of Computer Science and Technology, University of Cambridge, UK
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in narrative text and mathematical equations.
Open Source Code Yes The code for the experiments used in this paper is publicly available at https://github. com/proroklab/Het GPPO.
Open Datasets Yes The simulation environments representing these tasks are partly new implementations and partly adapted from existing environments in the VMAS benchmark set (Bettini et al., 2022).
Dataset Splits No The paper describes generating experience through 'rollouts' and 'episodes' (e.g., '600 episodes of experience', '300 (wind) episodes of experience') which are used for training, but it does not specify fixed train/test/validation splits of a static dataset.
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No Training is performed in RLlib (Liang et al., 2018) using Py Torch (Paszke et al., 2019) and a multi-agent implementation of the PPO algorithm (Blumenkamp and Prorok, 2021). The paper mentions software components but does not provide their specific version numbers.
Experiment Setup Yes Table 4: Training parameters for all evaluations. Training PPO Batch size 60000 ϵ 0.2 Minibatch size 4096 γ 0.99 SDG Iterations 40 λ 0.9 # Workers 5 Entropy coeff 0 # Envs per worker 50 KL coeff 0.01 Learning rate 5e-5 KL target 0.01