Formally Verified Approximate Policy Iteration
Authors: Maximilian Schäffeler, Mohammad Abdulaziz
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation on benchmark problems shows that it is practical. As part of the development, we build verified software to certify linear programming solutions. We discuss the verification process and the modifications we made to the algorithm during formalization. The results of the experiments (Table 1) show that the algorithm can deal with ring networks of half a million states and 20 actions. |
| Researcher Affiliation | Academia | 1Technische Universit at M unchen, Germany 2King s College London, United Kingdom EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Approximate Policy Iteration (API) api(t, π, w) := if t tmax or err ϵ or w= then (t, π , w , err, w=) else api(t + 1, π , w ) where w := upd w(π) π := greedy π(w ) err := factored err(π , w ) w= := w = w |
| Open Source Code | Yes | Code https://github.com/schaeffm/fmdp isabelle |
| Open Datasets | No | The paper mentions using "the ring and star topologies from (Guestrin et al. 2003)" for evaluation. This refers to problem domains or models described in a referenced paper, not publicly available datasets in the conventional sense (e.g., raw data files). The text does not provide concrete access information (link, DOI, or repository) for any specific dataset. |
| Dataset Splits | No | The paper evaluates an algorithm on "ring and star topologies" from a benchmark paper. It does not use "datasets" in the typical machine learning sense with explicit training/test/validation splits. Therefore, information about dataset splits is not applicable and not provided. |
| Hardware Specification | Yes | We run our implementation on an Intel i7-11800H CPU and set the discount factor to 0.9 in all our experiments. |
| Software Dependencies | No | The paper mentions using 'Isabelle/HOL' for development, and 'QSopt ex' and 'So Plex' as LP solvers. It also states code is exported to Scala. However, no specific version numbers for the utilized software dependencies (e.g., Isabelle/HOL version, So Plex version, Scala version) are provided within the main text that are relevant for replication. |
| Experiment Setup | Yes | We run our implementation on an Intel i7-11800H CPU and set the discount factor to 0.9 in all our experiments. In all runs, the weights converged after max. 5 iterations. |