Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs
Authors: William H English, Dominic Simon, Sumit Kumar Jha, Rickard Ewetz
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of Gra FT using the CW, GLTL, and Navi benchmarks. Compared with state-of-the-art translation approaches, it can be observed that Gra FT improves the end-to-end translation accuracy by 5.49% and out-of-domain translation accuracy by 14.06% on average. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, University of Florida, Gainesville, Florida 2Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, Florida. |
| Pseudocode | Yes | Algorithm 1 Temporal Logic Logits Processor |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It does not contain a specific repository link, an explicit code release statement, or indicate code in supplementary materials. |
| Open Datasets | Yes | Our evaluation datasets include Navigation (Wang et al., 2021), GLTL (Gopalan et al., 2018), and CW (Mac Glashan et al., 2015). Some statistics on these datasets are given in the appendix A.1. |
| Dataset Splits | No | We perform our evaluation of the translation models and end-to-end approaches using 1000 examples from each dataset. |
| Hardware Specification | Yes | We conducted our evaluation on a machine with one NVIDIA RTX 4070 Ti Super GPU, one Intel i9-14900KF 32 Core CPU, and 64GB of RAM. |
| Software Dependencies | No | The paper mentions models like BERT (Devlin et al., 2019) and T5 (Raffel et al., 2020), and states 'The T5 checkpoint provided at the Hugging Face-hosted repository (Raffel et al., 2020)', but it does not specify concrete version numbers for any software libraries, frameworks, or environments used to implement or run the experiments. |
| Experiment Setup | Yes | Each AP grounding model was trained for 3 epochs at a learning rate of 1e-5. Each translation model was trained for 3 epochs at a learning rate of 2e-5. |