Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs

Authors: William H English, Dominic Simon, Sumit Kumar Jha, Rickard Ewetz

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of Gra FT using the CW, GLTL, and Navi benchmarks. Compared with state-of-the-art translation approaches, it can be observed that Gra FT improves the end-to-end translation accuracy by 5.49% and out-of-domain translation accuracy by 14.06% on average.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, University of Florida, Gainesville, Florida 2Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, Florida.
Pseudocode Yes Algorithm 1 Temporal Logic Logits Processor
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It does not contain a specific repository link, an explicit code release statement, or indicate code in supplementary materials.
Open Datasets Yes Our evaluation datasets include Navigation (Wang et al., 2021), GLTL (Gopalan et al., 2018), and CW (Mac Glashan et al., 2015). Some statistics on these datasets are given in the appendix A.1.
Dataset Splits No We perform our evaluation of the translation models and end-to-end approaches using 1000 examples from each dataset.
Hardware Specification Yes We conducted our evaluation on a machine with one NVIDIA RTX 4070 Ti Super GPU, one Intel i9-14900KF 32 Core CPU, and 64GB of RAM.
Software Dependencies No The paper mentions models like BERT (Devlin et al., 2019) and T5 (Raffel et al., 2020), and states 'The T5 checkpoint provided at the Hugging Face-hosted repository (Raffel et al., 2020)', but it does not specify concrete version numbers for any software libraries, frameworks, or environments used to implement or run the experiments.
Experiment Setup Yes Each AP grounding model was trained for 3 epochs at a learning rate of 1e-5. Each translation model was trained for 3 epochs at a learning rate of 2e-5.