Column Generation Algorithms for Constrained POMDPs

Authors: Erwin Walraven, Matthijs T. J. Spaan

JAIR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present our experimental evaluation based on single-agent and multiagent planning problems which include constraints. For single-agent problems we evaluate the performance of CGCP by comparing it with a finite-horizon version of CALP (Poupart et al., 2015). Experiments based on several domains show that our method outperforms the current state of the art.
Researcher Affiliation Academia Erwin Walraven EMAIL Matthijs T. J. Spaan EMAIL Delft University of Technology, The Netherlands
Pseudocode Yes Algorithm 1: Column generation; Algorithm 2: Point-based value iteration for finite-horizon POMDPs (Point Based); Algorithm 3: Belief expansion algorithm (Expand Beliefs); Algorithm 4: Generating a policy graph from alpha vectors (Generate Policy Graph); Algorithm 5: Adapted column generation (CGCP); Algorithm 6: Get the probability that the action defined by G and Γ1, . . . , Γh deviates; Algorithm 7: Computing upper bound using sawtooth approximation (Upper Bound)
Open Source Code No The paper states: We implemented the algorithms using Java version 8, and the experiments were executed on an Intel Xeon 3.70 GHz CPU with a 5 GB memory limit. For solving LPs we use Gurobi version 6.5.2. However, it does not explicitly state that the source code for the methodology described in this paper is made publicly available, nor does it provide a link to a repository.
Open Datasets Yes For the robot navigation domains we took the corresponding POMDP domain description from pomdp.org. Note that the Maze20 domain corresponds to milos-aaai97. We made the following changes to the domains. We added a trap state which ensures that the robot transitions to the trap state after reaching the goal. We also added an action which enables the robot to be idle for one time step, and we created a cost function in which all non-idle actions have cost 1. Table 4 shows the size of the domains considered. For the condition-based maintenance domain we use the bridge-repair domain provided on pomdp.org. In this domain the actions correspond to specific types of maintenance, followed by inspection. For online advertising problems we use the web-ad domain provided on pomdp.org.
Dataset Splits No The paper uses standard POMDP benchmark domains but describes modifications to them (e.g., adding a trap state or cost functions). It does not, however, specify any explicit training, test, or validation splits for these domains, as these are typically not applicable in the context of POMDP problems as datasets in the machine learning sense.
Hardware Specification Yes We implemented the algorithms using Java version 8, and the experiments were executed on an Intel Xeon 3.70 GHz CPU with a 5 GB memory limit. For solving LPs we use Gurobi version 6.5.2.
Software Dependencies Yes We implemented the algorithms using Java version 8, and the experiments were executed on an Intel Xeon 3.70 GHz CPU with a 5 GB memory limit. For solving LPs we use Gurobi version 6.5.2.
Experiment Setup Yes We run CGCP with a time limit of 1000 seconds (i.e., T = 1000). We solve the subproblems initially for at most 100 seconds, and upon convergence this is incremented by 100 seconds (i.e., τ = τ + = 100). We use precision ρ = 3, and the time limit of CALP is set to the actual runtime of CGCP on the same instance, with a minimum of 20 seconds. We run CGCP with a time limit of 3600 seconds (i.e., T = 3600), and for the point-based solver we initially solve during 60 seconds, which can be incremented by 60 seconds after converging (i.e., τ = τ + = 60). We use planning horizon h = 24 and precision ρ = 3.