reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Authors: Maris F. L. Galesloot, Roman Andriushchenko, Milan Ceska, Sebastian Junges, Nils Jansen

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evaluation shows that, compared to various baselines, our approach (1) produces policies that are more robust and generalize better to unseen POMDPs, and (2) scales to HM-POMDPs that consist of over a hundred thousand environments. ... 5 Experimental Evaluation In this section, we evaluate RFPG on the following questions. (Q1) Does RFPG produce policies with higher robust performance compared to several baselines? (Q2) Can RFPG generalize to unseen environments? (Q3) How does the POMDP selection affect performance?
Researcher Affiliation	Academia	1Radboud University Nijmegen, The Netherlands 2Brno University of Technology, Czechia 3Ruhr-University Bochum, Germany EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: The RFPG algorithm
Open Source Code	Yes	Code is on Zenodo (https://doi.org/10.5281/zenodo.15479642) and the paper with appendix is on ar Xiv [Galesloot et al., 2025].
Open Datasets	Yes	We extend four POMDP benchmarks [Littman et al., 1997; Norman et al., 2017; Qiu et al., 1999] and one family of MDPs [Andriushchenko et al., 2024] to HM-POMDPs. These benchmarks together encompass a varied selection of different complexities of HM-POMDPs, i.e., different numbers of POMDPs and sizes thereof, as reported in Table 1. Appendix C gives a detailed description of the benchmarks.
Dataset Splits	Yes	(1) Pick a random subset of ten POMDPs from the full HM-POMDP, (2) compute a robust policy for this smaller sub-HM-POMDP using the four baselines and RFPG (referred to as RFPG-S), (3) compare the achieved robust performance of RFPG to the baselines on this sub-HM-POMDP (Q1). ... (5) compare the robust performance of the resulting six policies on the full HMPOMDP using the policy evaluation method from Section 4.3. From this experiment, we can not only assess the scalability of our approach compared to the baselines but, moreover, the ability to generalize to unseen environments (Q2). Additionally, we can see if RFPG produces a better robust performance than RFPG-S, indicating whether it is essential to assess all POMDPs within an HM-POMDP. ... To report statistically significant results, each experiment was carried out on 10 different subsets obtained using stratified sampling from the full HM-POMDP.
Hardware Specification	No	The paper mentions "Appendix D provides information on the infrastructure used to run the experiments." However, the provided text does not contain specific hardware details like GPU/CPU models or memory specifications.
Software Dependencies	No	The paper refers to tools like PAYNT [Andriushchenko et al., 2021] and SAYNT [Andriushchenko et al., 2023] but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	GASTEPS is a hyperparameter that should be tuned based on the size of the HM-POMDP: having many instances \|I\| slows down the policy evaluation, while many states \|S\| slows down the gradient update steps. In our experiments, we picked GASTEPS = 10, such that at most 75% of the computation time is spent on policy evaluation. ... All methods have a one-hour timeout to compute a policy; in case of a timeout, we report the robust performance of a uniform random policy.