Multiple Mean-Payoff Optimization Under Local Stability Constraints

Authors: David Klaška, Antonín Kučera, Vojtěch Kůr, Vít Musil, Vojtěch Řehák

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we design and evaluate the first efficient and scalable solution to this problem applicable to Markov decision processes. Our Contribution. We design the first efficient and scalable strategy synthesis algorithm for optimizing multiple window mean payoffs in a given Markov decision process. Our experiments show that the algorithm can construct high-quality (and sometimes quite sophisticated) strategies for complicated instances of non-trivial size. In our experiments, we concentrate on the following: (1) Demonstrating the improvement achieved by the dynamic procedure described above, where the baseline is a naive DFS-based procedure. (2) Evaluating the scalability of our algorithm and the quality of the constructed strategies. (3) Analyzing the power of randomization to compensate for insufficient memory.
Researcher Affiliation Academia Masaryk University, Brno, Czechia EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Evaluation via dynamic programming; Algorithm 2: Evaluation via DFS
Open Source Code Yes Code https://gitlab.fi.muni.cz/formela/2025-aaai-mmp
Open Datasets No The paper uses
Dataset Splits No The paper describes constructing graphs Dℓ for experiments but does not provide details on how data within these graphs is split for training, validation, or testing, as it's not a typical supervised learning setup.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No Our implementation uses PYTORCH framework (Paszke et al. 2019) and its automatic differentiation with ADAM optimizer (Kingma and Ba 2015). While PyTorch and ADAM are mentioned, specific version numbers for these software components are not provided.
Experiment Setup Yes For all scenarios (ℓ, d) and K {1, . . . , 5}, we invoked WINMPSYNT 100 times, where K memory states are assigned to every vertex. The number of optimization steps is set to 1000.