reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL

Authors: Mathias Jackermeier, Alessandro Abate

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency. Code available at: https://deep-ltl.github.io/ 5 EXPERIMENTS We evaluate our approach, called Deep LTL, in a variety of environments and on a range of LTL specifications of varying difficulty.
Researcher Affiliation	Academia	Mathias Jackermeier, Alessandro Abate Department of Computer Science, University of Oxford EMAIL
Pseudocode	Yes	Algorithm 1 Computing paths to accepting cycles An LDBA B = (Q, q0, Σ, δ, F, E) and current state q. 1: procedure DFS(q, p, i) i is in the index of the last seen accepting state, or 1 otherwise 2: P 3: if q F then 4: i \|p\| 5: end if 6: for all a 2AP {ε} do 7: p [p, q] 8: q δ(q, a) 9: if q p then 10: if index of q in p i then 11: P = P {p } 12: end if 13: else 14: P = P DFS(q , p , i) 15: end if 16: end for 17: return P 18: end procedure 19: i 0 if q F else i 1 20: return DFS(q, [], i)
Open Source Code	Yes	Code available at: https://deep-ltl.github.io/
Open Datasets	Yes	Letter World environment (Vaezipoor et al., 2021)... Zone Env environment from Vaezipoor et al. (2021)... Our implementation is based on the Safety Gymnasium suite (Ji et al., 2023)... Flat World environment (Voloshin et al., 2023; Shah et al., 2024)...
Dataset Splits	No	The paper talks about randomly sampled tasks and initial positions within the environments, rather than predefined dataset splits for training and evaluation. For example: "Both the zone and robot positions are randomly sampled at the beginning of each episode." (Zone Env, F.1)
Hardware Specification	No	The paper mentions training for 15M interaction steps on each environment but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for these computations.
Software Dependencies	No	In our experiments, we use proximal policy optimisation (PPO) (Schulman et al., 2017) to optimise the policy, but our approach can be combined with any RL algorithm. We use the Adam optimiser (Kingma & Ba, 2015) for all methods and environments. The paper mentions PPO and Adam optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Neural network architectures. Our choice of neural network architectures is similar to previous work (Vaezipoor et al., 2021). For Deep LTL and LTL2Action, we employ a fully connected actor network with [64, 64, 64] units and Re LU as the activation function. The critic has network structure [64, 64] and uses Tanh activations in Letter World and Zone Env, and Re LU activations in Flat World... PPO hyperparameters. The hyperparameters for PPO (Schulman et al., 2017) are listed in Table 4. We use the Adam optimiser (Kingma & Ba, 2015) for all methods and environments. The threshold λ for strict negative assignments in Deep LTL is set to 0.4 across experiments. We design training curricula in order to gradually expose the policy to more challenging tasks.