reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

Authors: Tobias Meggendorfer, Maximilian Weininger, Patrick Wienhöft

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our prototype implementation outperforms existing tools by several orders of magnitude and can solve RMDPs with a million states in under a minute. 6 Experimental Evaluation We implemented a prototype in Java, based on PET (Meggendorfer and Weininger 2024). For linear optimization, we use the (pure Java) library oj! Algorithms. We ran our experiments on a machine with standard hardware (AMD Ryzen 5 CPU, 16GB RAM) running Linux Mint OS and using Open JDK 21 as JRE. Our tool, its source code, all models, and instructions to replicate all results can be found at (Meggendorfer 2024). Experimental Results Table 1 shows that our approach massively outperforms RVI, RRVI, and RRPI by several orders of magnitude. In particular, it seems that the runtime of their approaches grow signiﬁcantly faster than ours. For example, going from lake-10-M to lake-15-M increases the runtime of RPPI by a factor of about 50, while ours only increases by a factor of 5. We believe this is partly due to implementation inefﬁciencies, but more importantly due the exponential space requirement of constructing the induced SG explicitly. In particular, just building the game structure for lake-100-U in their implementation takes 200s, and crashes with a memory-out on cont-100. In Table 2, we compare to the approach of PRISM. Our approach has runtimes in the same order of magnitude, with differences likely due to implementation details. However, recall that by design our tool needs to work more, as it provides guarantees. Thus, these results demonstrate that additionally obtaining guarantees via our approach does not drastically increase the runtime and scales to signiﬁcantly sized models.
Researcher Affiliation	Academia	1Lancaster University Leipzig, Leipzig, Germany 2Institute of Science and Technology Austria, Klosterneuburg, Austria 3Dresden University of Technology, Dresden, Germany
Pseudocode	Yes	Algorithm 1 Best-Effort Implicit Value Iteration for RMDP Algorithm 2 Bounded Value Iteration for RMDP
Open Source Code	Yes	Code https://zenodo.org/records/14385450 Our tool, its source code, all models, and instructions to replicate all results can be found at (Meggendorfer 2024).
Open Datasets	Yes	Finally, we modiﬁed several standard models from (Hartmanns et al. 2019) by adding rectangular constraints on selected transitions or adding Lp balls around all transitions. All considered (and further) models are included in the artefact. Our tool, its source code, all models, and instructions to replicate all results can be found at (Meggendorfer 2024).
Dataset Splits	No	The paper mentions using
Hardware Specification	Yes	We ran our experiments on a machine with standard hardware (AMD Ryzen 5 CPU, 16GB RAM) running Linux Mint OS and using Open JDK 21 as JRE.
Software Dependencies	No	We implemented a prototype in Java, based on PET (Meggendorfer and Weininger 2024). For linear optimization, we use the (pure Java) library oj! Algorithms. We ran our experiments on a machine with standard hardware (AMD Ryzen 5 CPU, 16GB RAM) running Linux Mint OS and using Open JDK 21 as JRE.
Experiment Setup	Yes	Guarantees and Runtime. We remark that RVI and RRVI as well as PRISM do not give a practical stopping criterion. For the former two, the implementation from (Chatterjee et al. 2024) aids them by stopping once the iterates are sufﬁciently close to the correct value, while PRISM stops once the iterates do not changes much between steps. Notably, PRISM s heuristic can indeed lead to stopping early and wrongly, see (Haddad and Monmege 2018). In contrast, our approach produces converging lower and upper bounds and thus also provides a correct stopping criterion. Computing both bounds until achieving precision of ε (10 6 in our experiments) naturally requires more effort than just working with one side and stopping at the correct time via an oracle or heuristics: Model Ours RVI RRVI RPPI cont-50 <1 223 <1 30 cont-75 <1 700 <1 70 cont-100 <1 M/O cont-125 1 M/O lake-10-U <1 T/O 7 lake-10-M <1 26 lake-15-U <1 T/O 136 lake-15-M 2 T/O lake-100-U 6 T/O T/O lake-100-M 12 T/O Table 1: Comparison of our approach for maximizing LRA to the approaches implemented in (Chatterjee et al. 2024) on the models used in that paper. We report solving times (excluding model building / parsing) in seconds. A dash indicates the approach does not support a particular model. T/O denotes a timeout of 15 minutes, M/O a memory-out crash. lake-n are their frozenlake models of size n n, the sufﬁx U or M indicates the unichain or multichain variant. cont-n are the contamination models with n states.