Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Authors: Akshat Kumar, Shlomo Zilberstein, Marc Toussaint

JAIR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We begin the empirical evaluation with experiments conducted with 2-agent problems. This is followed by experiments with larger problems involving up to 20 agents. We experimented with several standard 2-agent Dec-POMDP benchmarks with discounting factor γ = 0.9. We compare our EM algorithm with the decentralized bounded policy iteration (DEC-BPI) algorithm (Bernstein et al., 2009) and a non-linear, non-convex optimization solver (NLP) (Amato et al., 2010).
Researcher Affiliation Academia Akshat Kumar EMAIL School of Information Systems Singapore Management University, Singapore Shlomo Zilberstein EMAIL College of Information and Computer Sciences University of Massachusetts, Amherst, USA Marc Toussaint EMAIL Department of Computer Science University of Stuttgart, Germany
Pseudocode No The paper describes the Expectation-Maximization (EM) algorithm through mathematical equations and textual explanations of its steps (E-step and M-step), but it does not present any structured pseudocode blocks or clearly labeled algorithm sections.
Open Source Code No The paper states: "We implemented the EM algorithm in JAVA." (Section 6.1) and "The EM algorithm was implemented in JAVA." (Section 6.2). However, it does not provide any explicit statement about making the code publicly available, nor does it include a link to a code repository or mention code in supplementary materials. Future plans for exploration are mentioned, but not code release.
Open Datasets Yes We experimented with several standard 2-agent Dec-POMDP benchmarks with discounting factor γ = 0.9. (Section 6.1) ... Figure 9 shows the results for the multi-agent tiger problem, involving two doors with a tiger behind one door and a treasure behind the other. Agents should coordinate to open the door leading to the treasure (Amato et al., 2010). ... We experimented on a target tracking application in sensor networks modeled as an ND-POMDP (Nair et al., 2005). Figure 11 shows four sensor topologies: 5P, 11H and 15-3D by Marecki et al. (2008) and the largest 20D by Kumar and Zilberstein (2009b).
Dataset Splits No The paper uses standard multiagent planning benchmarks and problem domains (e.g., 'broadcast channel problem', 'recycling robots problem', 'multi-agent tiger problem') for evaluation. These are problem instances for which policies are optimized, rather than traditional datasets that are split into training, validation, and test sets. The paper does not specify any explicit dataset splits.
Hardware Specification Yes All our experiments were on a Mac with 4GB RAM and 2.4GHz CPU. (Section 6.1) ... All our experiments were done on an 8-core i Mac with 2GB RAM. Our implementation used multithreading to parallelize EM as highlighted in Section 5.5 and utilized all 8-cores. (Section 6.2)
Software Dependencies No The paper states: "We implemented the EM algorithm in JAVA." (Section 6.1) and mentions the use of "an off-the-shelf solver, Snopt (Gill et al., 2002)" for the NLP approach (Section 6.1). However, specific version numbers for JAVA or Snopt are not provided.
Experiment Setup Yes We experimented with several standard 2-agent Dec-POMDP benchmarks with discounting factor γ = 0.9. (Section 6.1) ... Each data point for every algorithm tested and parameter setting is an average of 10 runs with random initial controller parameters. (Section 6.1) ... EM converges quickly with the greedy M-step of Toussaint et al. (2008), taking fewer than 200 iterations even for such large multiagent planning problems. (Section 6.2) ... EM with 2-node controllers is very fast and takes < 1s to converge (50 iterations). (Figure 8 caption)