DESPOT: Online POMDP Planning with Regularization

Authors: Nan Ye, Adhiraj Somani, David Hsu, Wee Sun Lee

JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online. ... Experiments show that the anytime DESPOT algorithm is successful on very large POMDPs with up to 1056 states.
Researcher Affiliation Academia Nan Ye EMAIL ACEMS & Queensland University of Technology, Australia Adhiraj Somani EMAIL David Hsu EMAIL Wee Sun Lee EMAIL National University of Singapore, Singapore
Pseudocode Yes Appendix B. Pseudocode for Anytime DESPOT ... Algorithm 6 Anytime DESPOT
Open Source Code Yes The source code for the algorithm is available online. ... 1. The source code for the algorithm is available at http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/.
Open Datasets Yes Tag is a standard POMDP benchmark introduced by Pineau et al. (2003). ... Next we consider Rock Sample, a well-established benchmark with a large state space (Smith & Simmons, 2004). ... Pocman (Silver & Veness, 2010) is a partially observable variant of the popular video game Pacman (Figure 4d).
Dataset Splits No For each algorithm, we tuned the key parameters on each domain through offline training, using a data set distinct from the online test data set, as we expect this to be the common usage mode for online planning. ... The online POMDP algorithms were given exactly 1 second per step to choose an action.
Hardware Specification No No specific hardware details (GPU models, CPU models, memory, or cloud instance types) are provided in the paper for running the experiments. The paper only mentions that "All algorithms were implemented in C++." and the time limit for online planning.
Software Dependencies No We implemented DESPOT and AEMS2 ourselves. We used the authors implementation of POMCP (Silver & Veness, 2010), but improved the implementation to support a very large number of observations and strictly adhere to the time limit for online planning. We used the APPL package for SARSOP (Kurniawati et al., 2008). All algorithms were implemented in C++.
Experiment Setup Yes Specifically, the regularization parameter λ for DESPOT was selected offline from the set {0, 0.01, 0.1, 1, 10} by running the algorithm with a training set distinct from the online test set. Similarly, the exploration constant c of POMCP was chosen from the set {1, 10, 100, 1000, 10000} for the best performance. ... Specifically, we chose ξ = 0.95 as in SARSOP (Kurniawati et al., 2008). We chose D = 90 for DESPOT because γD 0.01 when γ = 0.95, which is the typical discount factor used. We chose K = 500, but a smaller value may work as well.