reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Column Generation Algorithms for Constrained POMDPs

Authors: Erwin Walraven, Matthijs T. J. Spaan

JAIR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we present our experimental evaluation based on single-agent and multiagent planning problems which include constraints. For single-agent problems we evaluate the performance of CGCP by comparing it with a ﬁnite-horizon version of CALP (Poupart et al., 2015). Experiments based on several domains show that our method outperforms the current state of the art.
Researcher Affiliation	Academia	Erwin Walraven EMAIL Matthijs T. J. Spaan EMAIL Delft University of Technology, The Netherlands
Pseudocode	Yes	Algorithm 1: Column generation; Algorithm 2: Point-based value iteration for ﬁnite-horizon POMDPs (Point Based); Algorithm 3: Belief expansion algorithm (Expand Beliefs); Algorithm 4: Generating a policy graph from alpha vectors (Generate Policy Graph); Algorithm 5: Adapted column generation (CGCP); Algorithm 6: Get the probability that the action deﬁned by G and Γ1, . . . , Γh deviates; Algorithm 7: Computing upper bound using sawtooth approximation (Upper Bound)
Open Source Code	No	The paper states: We implemented the algorithms using Java version 8, and the experiments were executed on an Intel Xeon 3.70 GHz CPU with a 5 GB memory limit. For solving LPs we use Gurobi version 6.5.2. However, it does not explicitly state that the source code for the methodology described in this paper is made publicly available, nor does it provide a link to a repository.
Open Datasets	Yes	For the robot navigation domains we took the corresponding POMDP domain description from pomdp.org. Note that the Maze20 domain corresponds to milos-aaai97. We made the following changes to the domains. We added a trap state which ensures that the robot transitions to the trap state after reaching the goal. We also added an action which enables the robot to be idle for one time step, and we created a cost function in which all non-idle actions have cost 1. Table 4 shows the size of the domains considered. For the condition-based maintenance domain we use the bridge-repair domain provided on pomdp.org. In this domain the actions correspond to speciﬁc types of maintenance, followed by inspection. For online advertising problems we use the web-ad domain provided on pomdp.org.
Dataset Splits	No	The paper uses standard POMDP benchmark domains but describes modifications to them (e.g., adding a trap state or cost functions). It does not, however, specify any explicit training, test, or validation splits for these domains, as these are typically not applicable in the context of POMDP problems as datasets in the machine learning sense.
Hardware Specification	Yes	We implemented the algorithms using Java version 8, and the experiments were executed on an Intel Xeon 3.70 GHz CPU with a 5 GB memory limit. For solving LPs we use Gurobi version 6.5.2.
Software Dependencies	Yes	We implemented the algorithms using Java version 8, and the experiments were executed on an Intel Xeon 3.70 GHz CPU with a 5 GB memory limit. For solving LPs we use Gurobi version 6.5.2.
Experiment Setup	Yes	We run CGCP with a time limit of 1000 seconds (i.e., T = 1000). We solve the subproblems initially for at most 100 seconds, and upon convergence this is incremented by 100 seconds (i.e., τ = τ + = 100). We use precision ρ = 3, and the time limit of CALP is set to the actual runtime of CGCP on the same instance, with a minimum of 20 seconds. We run CGCP with a time limit of 3600 seconds (i.e., T = 3600), and for the point-based solver we initially solve during 60 seconds, which can be incremented by 60 seconds after converging (i.e., τ = τ + = 60). We use planning horizon h = 24 and precision ρ = 3.