reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Oracle-Guided Approach to Constrained Policy Synthesis Under Uncertainty

Authors: Roman Andriushchenko, Milan Češka, Filip Macák, Sebastian Junges, Joost-Pieter Katoen

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents an experimental evaluation of our approach. It mainly focuses on three questions: Q1 Is our approach able to solve interesting problems from different planning and decisionmaking domains? To answer this, we consider three case studies: i) synthesising a power manager from a rough sketch, ii) constrained planning in a maze POMDP and iii) solving a meeting-grid Dec-POMDP. We discuss and interpret the resulting policies.
Researcher Affiliation	Academia	Roman Andriushchenko EMAIL Milan Češka EMAIL Filip Macák EMAIL Brno University of Technology, Brno, Czech Republic Sebastian Junges EMAIL Radboud University, Nijmegen, The Netherlands Joost-Pieter Katoen EMAIL RWTH Aachen University, Aachen, Germany
Pseudocode	Yes	Algorithm 1 summarises the key steps of the hybrid synthesis algorithm for the maximal reach-probability problem (see Section 2.3). We start (l.1) by creating the sub-space stack (initially containing only ) and construct the formula φ describing each unexplored parameter assignment from . In each iteration of the main loop, we explore exactly one sub-space i. We first run one iteration of AR (l.4), which may update the current optimum δ . This call also yields probability reachability bounds for i. If this one iteration proves that the sub-space does not contain a better solution than (updated) δ , we continue with the next sub-space. Otherwise, we set the time limit for CEG (l.6) based on the previous performance of AR and CEG. When exploring i, CEG uses boundsi to construct counterexamples modulo bounds. CEG terminates either by exploring all assignments from i or by reaching the allocated time limit. In the latter case (l.9), we split i into sub-families to be analysed in subsequent iterations. To balance AR vs. CEG reasoning, we track the number ηAR and ηCEG of assignments pruned per time unit by AR and CEG, respectively. Let τAR and τCEG denote the current total run-times of AR and CEG; then, on l.6 we allocate τAR ηCEG ηAR τCEG time units to CEG to explore i. The idea is to let AR and CEG run for equal portions of time when both are equally successful and to allocate more time to CEG when it outpaces AR in terms of the number of pruned assignments, and vice versa.
Open Source Code	Yes	Availability: The version of PAYNT used for the evaluation, as well as all benchmarks, are publicly available (Andriushchenko, Češka, Junges, Katoen, & Macák, 2024).
Open Datasets	Yes	We evaluate our approach on traditional CPOMDP benchmarks. We use single-objective POMDPs that are augmented with costs for low-reward actions taken from (Poupart et al., 2015). To showcase that our approach is competitive for Dec-POMDP solving, we compare PAYNT with Inf JESP on traditional Dec-POMDP benchmarks taken from http://masplan.org/problem_domains (Recycling, Grid3x3, Dec Tiger, Box Pushing) as well as on the already mentioned meeting-grid Dec-POMDP.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits or cross-validation details. It refers to 'traditional CPOMDP benchmarks' and 'traditional Dec-POMDP benchmarks' and states that 'The timeout for each experiment was 2 hours. We report the values for the policy that achieved the best average discounted reward (ADR) within this timeout.' without further specifying how the data within these benchmarks is split or used.
Hardware Specification	Yes	Hardware: All experiments are run on a single core of a machine equipped with an Intel i5-12600KF @4.9GHz CPU with 64GB RAM.
Software Dependencies	No	The paper mentions specific tools like Z3 and CVC5 in Section 5 ('SMT solver (we use Z3 (de Moura & Bjørner, 2008) or CVC5 (Barbosa et al., 2022))'), and PRISM in Section 6 and other sections ('PRISM guarded-command language (Kwiatkowska et al., 2011)'). However, it does not provide specific version numbers for these software components. The only version mentioned is 'PRISM 4.0' in the bibliography, but not in the context of the specific version used for the experiments.
Experiment Setup	Yes	The objective the specification Φ is to minimise the power consumption during the operation time while ensuring that the expected number of lost requests is at most one. The PM cannot fully observe the request queue. Instead, it can only distinguish four occupancy levels determined by the threshold parameters T1 {0, 0.1, 0.2, 0.3, 0.4}, T3 {0.6, 0.7, 0.8, 0.9}, and a fixed threshold T2=0.5. As the power consumption depends on the queue size, we aim to obtain the best power mode for each occupancy level. This is represented by parameter Pi (for occupancy level i) whose domain is {sleep, idle, active}. The maximal capacity QMAX ranges over {1, 2, . . . , 10}. ... Within about 3 minutes, our approach finds the following optimal solution: QMAX = 6, T1 = 0.1, T3 = 0.9, P1 = idle, P2=P3=P4=active. For fixed k, the goal ϕ0 is to find the k-FSC (finite state deterministic controller with k memory nodes; see the formal definition in App. B) that maximises the reachability probability to G. Additionally, we impose two antagonistic constraints: ϕ1 := the expected number of visits of the yellow cells with coins (denoted Ey) is at least 10, and ϕ2 := the expected number of visits of the "danger" red cells (denoted Er) is below 7. The timeout for the evaluation of the engines was 30 minutes. On model Pole, there is a maximising expected reward specification which is not supported by the CEG engine (and therefore by the Hybrid engine), so their times are missing. The timeout for the previous implementation of PAYNT was set to 2 hours.