Formally Verified Approximate Policy Iteration

Authors: Maximilian Schäffeler, Mohammad Abdulaziz

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation on benchmark problems shows that it is practical. As part of the development, we build verified software to certify linear programming solutions. We discuss the verification process and the modifications we made to the algorithm during formalization. The results of the experiments (Table 1) show that the algorithm can deal with ring networks of half a million states and 20 actions.
Researcher Affiliation Academia 1Technische Universit at M unchen, Germany 2King s College London, United Kingdom EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Approximate Policy Iteration (API) api(t, π, w) := if t tmax or err ϵ or w= then (t, π , w , err, w=) else api(t + 1, π , w ) where w := upd w(π) π := greedy π(w ) err := factored err(π , w ) w= := w = w
Open Source Code Yes Code https://github.com/schaeffm/fmdp isabelle
Open Datasets No The paper mentions using "the ring and star topologies from (Guestrin et al. 2003)" for evaluation. This refers to problem domains or models described in a referenced paper, not publicly available datasets in the conventional sense (e.g., raw data files). The text does not provide concrete access information (link, DOI, or repository) for any specific dataset.
Dataset Splits No The paper evaluates an algorithm on "ring and star topologies" from a benchmark paper. It does not use "datasets" in the typical machine learning sense with explicit training/test/validation splits. Therefore, information about dataset splits is not applicable and not provided.
Hardware Specification Yes We run our implementation on an Intel i7-11800H CPU and set the discount factor to 0.9 in all our experiments.
Software Dependencies No The paper mentions using 'Isabelle/HOL' for development, and 'QSopt ex' and 'So Plex' as LP solvers. It also states code is exported to Scala. However, no specific version numbers for the utilized software dependencies (e.g., Isabelle/HOL version, So Plex version, Scala version) are provided within the main text that are relevant for replication.
Experiment Setup Yes We run our implementation on an Intel i7-11800H CPU and set the discount factor to 0.9 in all our experiments. In all runs, the weights converged after max. 5 iterations.