reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revelations: A Decidable Class of POMDPs with Omega-Regular Objectives

Authors: Marius Belly, Nathanaël Fijalkow, Hugo Gimbert, Florian Horn, Guillermo A. Pérez, Pierre Vandenhove

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a comparison between our algorithm and off-the-shelf deep reinforcement learning (DRL) trained via an observation wrapper. As we will show in the paper, the MDP induced by the belief supports carries sufficient information to play in revealing POMDPs; hence, we used a wrapper implementing a subset construction on the fly to generate the current belief support, and focused on algorithms intended for MDPs. Spending moderate effort on reward engineering and hyperparameter tuning, we have been unable to match the performance of our algorithm (see Figure 3).
Researcher Affiliation	Academia	1CNRS, La BRI, Universit e de Bordeaux, France 2CNRS, IRIF, Universit e de Paris, France 3University of Antwerp Flanders Make, Antwerp, Belgium
Pseudocode	No	The paper describes theoretical results and algorithms, but it does not contain a clearly labeled pseudocode or algorithm block with structured steps.
Open Source Code	Yes	Code https://github.com/gaperez64/pomdps-reveal
Open Datasets	Yes	Here, we depict this value, per step (from 1 to 500) over 500 simulations of a revealing version of the classical tiger POMDP (Cassandra, Kaelbling, and Littman 1994). The example used will be discussed in Section 5, Example 2. ... We give an example of a strongly revealing POMDP inspired from the tiger of (Cassandra, Kaelbling, and Littman 1994), depicted in Figure 5. This example was used in Figure 3 in the introduction; the code to generate it in our tool is provided in (Belly et al. 2024, Appendix A).
Dataset Splits	No	The paper discusses simulations and evaluation of strategies in POMDPs, which do not typically involve dataset splits like those in supervised learning. It mentions "500 simulations" but not a training/test/validation split of a dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies	No	A2C, DQN, and PPO are (Mlp Policy) strategies obtained from the stable-baselines library (Raffin et al. 2021). The specific version number for stable-baselines is not provided.
Experiment Setup	No	A2C, DQN, and PPO are (Mlp Policy) strategies obtained from the stable-baselines library (Raffin et al. 2021), trained (for a total of 10k time steps) with default parameter values using a simple reward scheme: a good event yields a reward of 100; a bad one, 1. While the training duration and reward scheme are mentioned, the specific 'default parameter values' for the DRL algorithms are not explicitly stated.