reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regret-Free Reinforcement Learning for Temporal Logic Specifications

Authors: R Majumdar, Mahmoud Salamati, Sadegh Soudjani

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have implemented our algorithm with empirical evaluation on reach-avoid speciﬁcations in a gridworld environment reported in App. C. In particular, we compare the performance and convergence of our algorithm to the sample complexity and convergence of a PAC-MDP algorithm for LTL (Perez et al., 2024), and show that our algorithm convergences faster within a smaller number of episodes.
Researcher Affiliation	Academia	MPI-SWS, Kaiserslautern, Germany University of Birmingham, Birmingham, UK.
Pseudocode	Yes	Algorithm 1 Regret-free algorithm for until formulas (Zero Reg) Algorithm 2 Extended value iteration (EVI) Algorithm 3 Computing optimistic MDP in Eq. (5) (Inner Max) Algorithm 4 Regret-free learning algorithm for general LTL speciﬁcations Algorithm 5 Graph learning algorithm (Graph Learn) Algorithm 6 Reachability policy synthesis for graph learning (Reach)
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit code release statement) for the source code.
Open Datasets	No	We considered a reach-avoid policy synthesis problem in the gridworld example described in Fig. 2.
Dataset Splits	No	The paper uses a simulated 'gridworld example' for its experiments, which is an environment rather than a dataset with explicit training/test/validation splits.
Hardware Specification	Yes	The experiments are performed on a laptop with core i7 CPU at 3.10GHz, with 8GB of RAM.
Software Dependencies	No	The paper mentions "We implement Alg. 1" but does not specify any software names or versions (e.g., Python, PyTorch, specific libraries or solvers with version numbers).
Experiment Setup	Yes	We set δ = 0.1 over 10 runs. Furthermore, we group all of the cells associated with the wall into an absorbing state B, such that we have \|S\| = 17 and \|A\| = 4. The considered speciﬁcation is ϕ = Avoid U Goal, where G is marked as Goal and walls are marked as Avoid. ... We set pmin = 0.01.