Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Stick-Breaking Policy Learning in Dec-POMDPs
Authors: Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How
IJCAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods. |
| Researcher Affiliation | Academia | Miao Liu MIT Cambridge, MA EMAIL Christopher Amato University of New Hampshire Durham, NH EMAIL Xuejun Liao, Lawrence Carin Duke University Durham, NC EMAIL Jonathan P. How MIT Cambridge, MA EMAIL |
| Pseudocode | Yes | Algorithm 1 Batch VB Inference for Dec-SBPR |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | Downloaded from http://rbr.cs.umass.edu/camato/decpomdp/ download.html |
| Dataset Splits | No | The paper mentions using 'K = 300 episodes' for learning and '100 test episodes' for evaluation, but it does not specify explicit train/validation/test splits by percentages or counts, nor does it explicitly mention a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | For Dec-SBPR, the hyperparameters in (8) are set to c = 0.1 and d = 10 6 to promote sparse usage of FSC nodes. The policies are initialized as FSCs converted from the episodes with the highest rewards using a method similar to [Amato and Zilberstein, 2009]. |