DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL

Authors: Mathias Jackermeier, Alessandro Abate

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency. Code available at: https://deep-ltl.github.io/ 5 EXPERIMENTS We evaluate our approach, called Deep LTL, in a variety of environments and on a range of LTL specifications of varying difficulty.
Researcher Affiliation Academia Mathias Jackermeier, Alessandro Abate Department of Computer Science, University of Oxford EMAIL
Pseudocode Yes Algorithm 1 Computing paths to accepting cycles An LDBA B = (Q, q0, Σ, δ, F, E) and current state q. 1: procedure DFS(q, p, i) i is in the index of the last seen accepting state, or 1 otherwise 2: P 3: if q F then 4: i |p| 5: end if 6: for all a 2AP {ε} do 7: p [p, q] 8: q δ(q, a) 9: if q p then 10: if index of q in p i then 11: P = P {p } 12: end if 13: else 14: P = P DFS(q , p , i) 15: end if 16: end for 17: return P 18: end procedure 19: i 0 if q F else i 1 20: return DFS(q, [], i)
Open Source Code Yes Code available at: https://deep-ltl.github.io/
Open Datasets Yes Letter World environment (Vaezipoor et al., 2021)... Zone Env environment from Vaezipoor et al. (2021)... Our implementation is based on the Safety Gymnasium suite (Ji et al., 2023)... Flat World environment (Voloshin et al., 2023; Shah et al., 2024)...
Dataset Splits No The paper talks about randomly sampled tasks and initial positions within the environments, rather than predefined dataset splits for training and evaluation. For example: "Both the zone and robot positions are randomly sampled at the beginning of each episode." (Zone Env, F.1)
Hardware Specification No The paper mentions training for 15M interaction steps on each environment but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for these computations.
Software Dependencies No In our experiments, we use proximal policy optimisation (PPO) (Schulman et al., 2017) to optimise the policy, but our approach can be combined with any RL algorithm. We use the Adam optimiser (Kingma & Ba, 2015) for all methods and environments. The paper mentions PPO and Adam optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Neural network architectures. Our choice of neural network architectures is similar to previous work (Vaezipoor et al., 2021). For Deep LTL and LTL2Action, we employ a fully connected actor network with [64, 64, 64] units and Re LU as the activation function. The critic has network structure [64, 64] and uses Tanh activations in Letter World and Zone Env, and Re LU activations in Flat World... PPO hyperparameters. The hyperparameters for PPO (Schulman et al., 2017) are listed in Table 4. We use the Adam optimiser (Kingma & Ba, 2015) for all methods and environments. The threshold λ for strict negative assignments in Deep LTL is set to 0.4 across experiments. We design training curricula in order to gradually expose the policy to more challenging tasks.