IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking

Authors: Shubham Dipak Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, Sasa Misailovic

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation presents three distinct scenarios, which demonstrate the effectiveness of ITERGEN. First, we illustrate how it can be used to improve the accuracy of LLM-generated SQL queries by enforcing additional semantic constraints. ITERGEN achieves 18.5% mean improvement over the state-of-the-art grammar-guided generation technique (Ugare et al., 2024). Second, we show how ITERGEN effectively reduces privacy leaks in LLM-generated text from 51.4% to 0%, thus successfully safeguarding sensitive information while maintaining the quality of response. Third, we show that ITERGEN improves the accuracy of LLM-generated Vega-lite specification (a subset of JSON for data visualization) by 17.8% by enforcing semantic constraints.
Researcher Affiliation Academia Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, Sasa Misailovic University of Illinois Urbana-Champaign EMAIL
Pseudocode Yes The detailed pseudocode for the forward and backward algorithm are presented in Appendix A.1. A.1.1 ALGORITHM 1: START FUNCTION A.1.2 ALGORITHM 2: FORWARD FUNCTION A.1.3 ALGORITHM 3: BACKWARD FUNCTION
Open Source Code Yes Our code and additional resources are available at http://structuredllm.com. ITERGEN code is available at https://github.com/uiuc-arc/itergen. We provide the source code of ITERGEN as part of the supplementary material that can be used to reproduce our results.
Open Datasets Yes We use the standard Spider (Yu et al., 2018) text-2-SQL dataset for the evaluation. This dataset has 1034 problems, that are categorized into different difficulty levels easy (250), medium (440), hard (174), and extra hard (170). ... We use the Decoding Trust (Wang et al., 2024) privacy dataset... For the evaluation, we use the NLV Corpus (Srinivasan et al., 2021), a dataset comprising 814 examples of text utterances paired with corresponding Vega-Lite visualization specifications.
Dataset Splits No We use the standard Spider (Yu et al., 2018) text-2-SQL dataset for the evaluation. This dataset has 1034 problems, that are categorized into different difficulty levels easy (250), medium (440), hard (174), and extra hard (170). ... For the evaluation, we use the NLV Corpus (Srinivasan et al., 2021), a dataset comprising 814 examples of text utterances paired with corresponding Vega-Lite visualization specifications. The paper mentions the datasets and their categorization or total problem counts but does not explicitly describe how the data was split into training, validation, or test sets for their experiments.
Hardware Specification Yes Experimental Setup. We run experiments on a 48-core Intel Xeon Silver 4214R CPU with 2 NVidia RTX A5000 GPUs. ITERGEN is implemented using Py Torch (Paszke et al., 2019), Hugging Face transformers library (Wolf et al., 2020) and SYNCODE library (Ugare et al., 2024) for the parser-guided LLM generation infrastructure.
Software Dependencies No ITERGEN is implemented using Py Torch (Paszke et al., 2019), Hugging Face transformers library (Wolf et al., 2020) and SYNCODE library (Ugare et al., 2024) for the parser-guided LLM generation infrastructure. The paper lists software libraries used (Py Torch, Hugging Face transformers, SYNCODE) but does not provide specific version numbers for these dependencies, which are crucial for reproducibility.
Experiment Setup Yes We use greedy decoding for the experiment and set ITERGEN s maximum limit for moving backward as max_iter=20 and set the ITERGEN recurrence penalty to 0.7, as it worked well on a small subset of the training dataset. We use \n\n as an additional stop word to the EOS token for all experiments and use max new token limit as 100 for all three methods. ... For ITERGEN we set a recurrence penalty \u03b3 to 0.7, and limit the number of per-email backtracking attempts to 10. ... For ITERGEN we set a recurrence penalty \u03b3 to 0.1, and set max_iter to 50.