Deep Reinforcement Learning for Swarm Systems

Authors: Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the representation on two well-known problems from the swarm literature rendezvous and pursuit evasion in a globally and locally observable setup. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents, facilitating the development of complex collective strategies. Our results show that agents using our representation can learn faster and obtain policies of higher quality, suggesting that the representation as mean embedding is an efficient encoding of the global state configuration for swarm-based systems.
Researcher Affiliation Academia Maximilian Hüttenrauch EMAIL L-CAS University of Lincoln LN6 7TS Lincoln, UK Adrian Šošić EMAIL Bioinspired Communication Systems Technische Universität Darmstadt 64283 Darmstadt, Germany Gerhard Neumann EMAIL L-CAS University of Lincoln LN6 7TS Lincoln, UK
Pseudocode No The paper describes algorithms and methodologies in prose and through mathematical formulations, but does not include any explicitly labeled pseudocode or algorithm blocks with structured, code-like formatting.
Open Source Code Yes The source code to reproduce our results can be found online.1 1. https://github.com/LCAS/deep_rl_for_swarms
Open Datasets No The paper describes simulated environments for rendezvous and pursuit evasion problems (Section 5.1 and Appendices A, B, C) and does not refer to any external, publicly available datasets with concrete access information such as links, DOIs, or specific citations.
Dataset Splits No The paper conducts experiments in simulated environments and trains policies. It does not use pre-existing static datasets that would require explicit training/test/validation splits in the traditional sense. It describes a sampling strategy for experience collection: 'Our implementation is based on the Open AI baselines version of TRPO with 10 MPI workers, where each worker samples 2048 time steps, resulting in 2048N samples. Subsequently, we randomly choose the data of 8 agents, yielding 2048 * 10 * 8 = 163840 samples per TRPO iteration.'
Hardware Specification Yes While a typical experiment with 20 agents in our setup takes between four and six hours of training on a machine with ten cores (sampling trajectories in parallel)... Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt.
Software Dependencies No The paper mentions 'Our implementation is based on the Open AI baselines version of TRPO' but does not specify any version numbers for Open AI baselines or TRPO, nor any other software components with their versions.
Experiment Setup Yes Our implementation is based on the Open AI baselines version of TRPO with 10 MPI workers, where each worker samples 2048 time steps, resulting in 2048N samples. Subsequently, we randomly choose the data of 8 agents, yielding 2048 10 8 = 163840 samples per TRPO iteration. The chosen number of samples worked well throughout our experiments and was not extensively tuned. In all other experiments, the neural network mean feature embedding for agent i, given by φNN(Oi) = 1 |Oi| oi,j Oi φ(oi,j), is realized as the empirical mean of the outputs of a single layer feed-forward neural network, φ(oi,j) = h(Woi,j + b), with 64 neurons and a RELU non-linearity h. The histogram embedding is achieved with a two-dimensional histogram over the distance and bearing space to other agents. We use eight evenly spaced bins for each feature, resulting in a 64 dimensional feature vector.