A Finite-State Controller Based Offline Solver for Deterministic POMDPs
Authors: Alex Schutz, Yang You, MatÃas Mattamala, Ipek Caliskanelli, Bruno Lacerda, Nick Hawes
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose Det MCVI, an adaptation of the Monte Carlo Value Iteration (MCVI) algorithm for Det POMDPs, which builds policies in the form of finite-state controllers (FSCs). Det MCVI solves large problems with a high success rate, outperforming existing baselines for Det POMDPs. We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario. Section 5 'Synthetic Experiments' and Section 6 'Forest Experiment' describe the evaluation of the proposed method against baselines on various problem instances and a real-world robotics problem. |
| Researcher Affiliation | Collaboration | The authors are affiliated with 'University of Oxford' (academic) and 'UK Atomic Energy Authority' (government research/industry). This mix indicates a collaborative affiliation. |
| Pseudocode | Yes | The paper contains structured pseudocode blocks titled 'Algorithm 1: Belief Tree Search' and 'Algorithm 2: Det MCVI Backup'. |
| Open Source Code | Yes | Our implementation is found at http://github.com/ori-goals/Det MCVI. |
| Open Datasets | No | The paper describes synthetic problem domains like CTP, Wumpus World, Maze, and Sort, and a 'map generated from operator-guided navigation in a forest' for a real-world experiment. However, it does not provide concrete access information (link, DOI, repository, or formal citation) for any publicly available or open datasets used or generated. |
| Dataset Splits | No | The paper mentions evaluating policies using '10^5 trials from states randomly sampled from the initial belief' and calculating performance 'over sets of 10 problem instances for the CTP and Maze problems, and over three random seeds for Wumpus and Sort.' It also describes a 'Belief Downsampling' strategy using N=10^5 samples. However, it does not specify explicit training/test/validation splits for a static dataset, but rather describes how problem instances and initial beliefs are generated or sampled for evaluation. |
| Hardware Specification | No | The paper mentions the 'ANYbotics ANYmal D' in the Forest Experiment, which is the mobile robot used in the scenario, not the hardware used to run the experiments (e.g., training or policy generation). No specific GPU, CPU models, or other computing hardware specifications are provided for the experimental setup. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers required to replicate the experiments. Implementation details are mentioned as being in the appendix, but no versions are specified in the main text. |
| Experiment Setup | Yes | The paper specifies experimental parameters such as 'We impose a domain-dependent horizon T to shorten computation time for practicality' and for 'Belief Downsampling...We use N = 10^5'. These are concrete settings used in the experiments. |