IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
Authors: Shubham Dipak Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, Sasa Misailovic
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation presents three distinct scenarios, which demonstrate the effectiveness of ITERGEN. First, we illustrate how it can be used to improve the accuracy of LLM-generated SQL queries by enforcing additional semantic constraints. ITERGEN achieves 18.5% mean improvement over the state-of-the-art grammar-guided generation technique (Ugare et al., 2024). Second, we show how ITERGEN effectively reduces privacy leaks in LLM-generated text from 51.4% to 0%, thus successfully safeguarding sensitive information while maintaining the quality of response. Third, we show that ITERGEN improves the accuracy of LLM-generated Vega-lite specification (a subset of JSON for data visualization) by 17.8% by enforcing semantic constraints. |
| Researcher Affiliation | Academia | Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, Sasa Misailovic University of Illinois Urbana-Champaign EMAIL |
| Pseudocode | Yes | The detailed pseudocode for the forward and backward algorithm are presented in Appendix A.1. A.1.1 ALGORITHM 1: START FUNCTION A.1.2 ALGORITHM 2: FORWARD FUNCTION A.1.3 ALGORITHM 3: BACKWARD FUNCTION |
| Open Source Code | Yes | Our code and additional resources are available at http://structuredllm.com. ITERGEN code is available at https://github.com/uiuc-arc/itergen. We provide the source code of ITERGEN as part of the supplementary material that can be used to reproduce our results. |
| Open Datasets | Yes | We use the standard Spider (Yu et al., 2018) text-2-SQL dataset for the evaluation. This dataset has 1034 problems, that are categorized into different difficulty levels easy (250), medium (440), hard (174), and extra hard (170). ... We use the Decoding Trust (Wang et al., 2024) privacy dataset... For the evaluation, we use the NLV Corpus (Srinivasan et al., 2021), a dataset comprising 814 examples of text utterances paired with corresponding Vega-Lite visualization specifications. |
| Dataset Splits | No | We use the standard Spider (Yu et al., 2018) text-2-SQL dataset for the evaluation. This dataset has 1034 problems, that are categorized into different difficulty levels easy (250), medium (440), hard (174), and extra hard (170). ... For the evaluation, we use the NLV Corpus (Srinivasan et al., 2021), a dataset comprising 814 examples of text utterances paired with corresponding Vega-Lite visualization specifications. The paper mentions the datasets and their categorization or total problem counts but does not explicitly describe how the data was split into training, validation, or test sets for their experiments. |
| Hardware Specification | Yes | Experimental Setup. We run experiments on a 48-core Intel Xeon Silver 4214R CPU with 2 NVidia RTX A5000 GPUs. ITERGEN is implemented using Py Torch (Paszke et al., 2019), Hugging Face transformers library (Wolf et al., 2020) and SYNCODE library (Ugare et al., 2024) for the parser-guided LLM generation infrastructure. |
| Software Dependencies | No | ITERGEN is implemented using Py Torch (Paszke et al., 2019), Hugging Face transformers library (Wolf et al., 2020) and SYNCODE library (Ugare et al., 2024) for the parser-guided LLM generation infrastructure. The paper lists software libraries used (Py Torch, Hugging Face transformers, SYNCODE) but does not provide specific version numbers for these dependencies, which are crucial for reproducibility. |
| Experiment Setup | Yes | We use greedy decoding for the experiment and set ITERGEN s maximum limit for moving backward as max_iter=20 and set the ITERGEN recurrence penalty to 0.7, as it worked well on a small subset of the training dataset. We use \n\n as an additional stop word to the EOS token for all experiments and use max new token limit as 100 for all three methods. ... For ITERGEN we set a recurrence penalty \u03b3 to 0.7, and limit the number of per-email backtracking attempts to 10. ... For ITERGEN we set a recurrence penalty \u03b3 to 0.1, and set max_iter to 50. |