Regret-Free Reinforcement Learning for Temporal Logic Specifications
Authors: R Majumdar, Mahmoud Salamati, Sadegh Soudjani
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have implemented our algorithm with empirical evaluation on reach-avoid specifications in a gridworld environment reported in App. C. In particular, we compare the performance and convergence of our algorithm to the sample complexity and convergence of a PAC-MDP algorithm for LTL (Perez et al., 2024), and show that our algorithm convergences faster within a smaller number of episodes. |
| Researcher Affiliation | Academia | MPI-SWS, Kaiserslautern, Germany University of Birmingham, Birmingham, UK. |
| Pseudocode | Yes | Algorithm 1 Regret-free algorithm for until formulas (Zero Reg) Algorithm 2 Extended value iteration (EVI) Algorithm 3 Computing optimistic MDP in Eq. (5) (Inner Max) Algorithm 4 Regret-free learning algorithm for general LTL specifications Algorithm 5 Graph learning algorithm (Graph Learn) Algorithm 6 Reachability policy synthesis for graph learning (Reach) |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit code release statement) for the source code. |
| Open Datasets | No | We considered a reach-avoid policy synthesis problem in the gridworld example described in Fig. 2. |
| Dataset Splits | No | The paper uses a simulated 'gridworld example' for its experiments, which is an environment rather than a dataset with explicit training/test/validation splits. |
| Hardware Specification | Yes | The experiments are performed on a laptop with core i7 CPU at 3.10GHz, with 8GB of RAM. |
| Software Dependencies | No | The paper mentions "We implement Alg. 1" but does not specify any software names or versions (e.g., Python, PyTorch, specific libraries or solvers with version numbers). |
| Experiment Setup | Yes | We set δ = 0.1 over 10 runs. Furthermore, we group all of the cells associated with the wall into an absorbing state B, such that we have |S| = 17 and |A| = 4. The considered specification is ϕ = Avoid U Goal, where G is marked as Goal and walls are marked as Avoid. ... We set pmin = 0.01. |