DESPOT: Online POMDP Planning with Regularization
Authors: Nan Ye, Adhiraj Somani, David Hsu, Wee Sun Lee
JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online. ... Experiments show that the anytime DESPOT algorithm is successful on very large POMDPs with up to 1056 states. |
| Researcher Affiliation | Academia | Nan Ye EMAIL ACEMS & Queensland University of Technology, Australia Adhiraj Somani EMAIL David Hsu EMAIL Wee Sun Lee EMAIL National University of Singapore, Singapore |
| Pseudocode | Yes | Appendix B. Pseudocode for Anytime DESPOT ... Algorithm 6 Anytime DESPOT |
| Open Source Code | Yes | The source code for the algorithm is available online. ... 1. The source code for the algorithm is available at http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/. |
| Open Datasets | Yes | Tag is a standard POMDP benchmark introduced by Pineau et al. (2003). ... Next we consider Rock Sample, a well-established benchmark with a large state space (Smith & Simmons, 2004). ... Pocman (Silver & Veness, 2010) is a partially observable variant of the popular video game Pacman (Figure 4d). |
| Dataset Splits | No | For each algorithm, we tuned the key parameters on each domain through offline training, using a data set distinct from the online test data set, as we expect this to be the common usage mode for online planning. ... The online POMDP algorithms were given exactly 1 second per step to choose an action. |
| Hardware Specification | No | No specific hardware details (GPU models, CPU models, memory, or cloud instance types) are provided in the paper for running the experiments. The paper only mentions that "All algorithms were implemented in C++." and the time limit for online planning. |
| Software Dependencies | No | We implemented DESPOT and AEMS2 ourselves. We used the authors implementation of POMCP (Silver & Veness, 2010), but improved the implementation to support a very large number of observations and strictly adhere to the time limit for online planning. We used the APPL package for SARSOP (Kurniawati et al., 2008). All algorithms were implemented in C++. |
| Experiment Setup | Yes | Specifically, the regularization parameter λ for DESPOT was selected offline from the set {0, 0.01, 0.1, 1, 10} by running the algorithm with a training set distinct from the online test set. Similarly, the exploration constant c of POMCP was chosen from the set {1, 10, 100, 1000, 10000} for the best performance. ... Specifically, we chose ξ = 0.95 as in SARSOP (Kurniawati et al., 2008). We chose D = 90 for DESPOT because γD 0.01 when γ = 0.95, which is the typical discount factor used. We chose K = 500, but a smaller value may work as well. |