reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stick-Breaking Policy Learning in Dec-POMDPs

Authors: Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Researcher Affiliation	Academia	Miao Liu MIT Cambridge, MA EMAIL Christopher Amato University of New Hampshire Durham, NH EMAIL Xuejun Liao, Lawrence Carin Duke University Durham, NC EMAIL Jonathan P. How MIT Cambridge, MA EMAIL
Pseudocode	Yes	Algorithm 1 Batch VB Inference for Dec-SBPR
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	Downloaded from http://rbr.cs.umass.edu/camato/decpomdp/ download.html
Dataset Splits	No	The paper mentions using 'K = 300 episodes' for learning and '100 test episodes' for evaluation, but it does not specify explicit train/validation/test splits by percentages or counts, nor does it explicitly mention a validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	For Dec-SBPR, the hyperparameters in (8) are set to c = 0.1 and d = 10 6 to promote sparse usage of FSC nodes. The policies are initialized as FSCs converted from the episodes with the highest rewards using a method similar to [Amato and Zilberstein, 2009].