reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Reinforcement Learning for Swarm Systems

Authors: Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the representation on two well-known problems from the swarm literature rendezvous and pursuit evasion in a globally and locally observable setup. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents, facilitating the development of complex collective strategies. Our results show that agents using our representation can learn faster and obtain policies of higher quality, suggesting that the representation as mean embedding is an eﬃcient encoding of the global state conﬁguration for swarm-based systems.
Researcher Affiliation	Academia	Maximilian Hüttenrauch EMAIL L-CAS University of Lincoln LN6 7TS Lincoln, UK Adrian Šošić EMAIL Bioinspired Communication Systems Technische Universität Darmstadt 64283 Darmstadt, Germany Gerhard Neumann EMAIL L-CAS University of Lincoln LN6 7TS Lincoln, UK
Pseudocode	No	The paper describes algorithms and methodologies in prose and through mathematical formulations, but does not include any explicitly labeled pseudocode or algorithm blocks with structured, code-like formatting.
Open Source Code	Yes	The source code to reproduce our results can be found online.1 1. https://github.com/LCAS/deep_rl_for_swarms
Open Datasets	No	The paper describes simulated environments for rendezvous and pursuit evasion problems (Section 5.1 and Appendices A, B, C) and does not refer to any external, publicly available datasets with concrete access information such as links, DOIs, or specific citations.
Dataset Splits	No	The paper conducts experiments in simulated environments and trains policies. It does not use pre-existing static datasets that would require explicit training/test/validation splits in the traditional sense. It describes a sampling strategy for experience collection: 'Our implementation is based on the Open AI baselines version of TRPO with 10 MPI workers, where each worker samples 2048 time steps, resulting in 2048N samples. Subsequently, we randomly choose the data of 8 agents, yielding 2048 * 10 * 8 = 163840 samples per TRPO iteration.'
Hardware Specification	Yes	While a typical experiment with 20 agents in our setup takes between four and six hours of training on a machine with ten cores (sampling trajectories in parallel)... Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt.
Software Dependencies	No	The paper mentions 'Our implementation is based on the Open AI baselines version of TRPO' but does not specify any version numbers for Open AI baselines or TRPO, nor any other software components with their versions.
Experiment Setup	Yes	Our implementation is based on the Open AI baselines version of TRPO with 10 MPI workers, where each worker samples 2048 time steps, resulting in 2048N samples. Subsequently, we randomly choose the data of 8 agents, yielding 2048 10 8 = 163840 samples per TRPO iteration. The chosen number of samples worked well throughout our experiments and was not extensively tuned. In all other experiments, the neural network mean feature embedding for agent i, given by φNN(Oi) = 1 \|Oi\| oi,j Oi φ(oi,j), is realized as the empirical mean of the outputs of a single layer feed-forward neural network, φ(oi,j) = h(Woi,j + b), with 64 neurons and a RELU non-linearity h. The histogram embedding is achieved with a two-dimensional histogram over the distance and bearing space to other agents. We use eight evenly spaced bins for each feature, resulting in a 64 dimensional feature vector.