Multiple Mean-Payoff Optimization Under Local Stability Constraints
Authors: David Klaška, Antonín Kučera, Vojtěch Kůr, Vít Musil, Vojtěch Řehák
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we design and evaluate the first efficient and scalable solution to this problem applicable to Markov decision processes. Our Contribution. We design the first efficient and scalable strategy synthesis algorithm for optimizing multiple window mean payoffs in a given Markov decision process. Our experiments show that the algorithm can construct high-quality (and sometimes quite sophisticated) strategies for complicated instances of non-trivial size. In our experiments, we concentrate on the following: (1) Demonstrating the improvement achieved by the dynamic procedure described above, where the baseline is a naive DFS-based procedure. (2) Evaluating the scalability of our algorithm and the quality of the constructed strategies. (3) Analyzing the power of randomization to compensate for insufficient memory. |
| Researcher Affiliation | Academia | Masaryk University, Brno, Czechia EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Evaluation via dynamic programming; Algorithm 2: Evaluation via DFS |
| Open Source Code | Yes | Code https://gitlab.fi.muni.cz/formela/2025-aaai-mmp |
| Open Datasets | No | The paper uses |
| Dataset Splits | No | The paper describes constructing graphs Dℓ for experiments but does not provide details on how data within these graphs is split for training, validation, or testing, as it's not a typical supervised learning setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | Our implementation uses PYTORCH framework (Paszke et al. 2019) and its automatic differentiation with ADAM optimizer (Kingma and Ba 2015). While PyTorch and ADAM are mentioned, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | For all scenarios (ℓ, d) and K {1, . . . , 5}, we invoked WINMPSYNT 100 times, where K memory states are assigned to every vertex. The number of optimization steps is set to 1000. |