SynCode: LLM Generation with Grammar Augmentation
Authors: Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, Gagandeep Singh
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments evaluating the effectiveness of Syn Code for JSON generation demonstrate that Syn Code eliminates all syntax errors and significantly outperforms state-of-the-art baselines. Furthermore, our results underscore how Syn Code significantly reduces 96.07% of syntax errors in generated Python and Go code, showcasing its substantial impact on enhancing syntactical precision in LLM generation. |
| Researcher Affiliation | Collaboration | Shubham Ugare University of Illinois Urbana-Champaign, USA Tarun Suresh University of Illinois Urbana-Champaign, USA Hangoo Kang University of Illinois Urbana-Champaign, USA Sasa Misailovic University of Illinois Urbana-Champaign, USA Gagandeep Singh University of Illinois Urbana-Champaign and VMware Research, USA |
| Pseudocode | Yes | Algorithm 1 Masked LLM Generation... Algorithm 2 Computing Grammar Mask... Algorithm 3 Syn Code Generation... Algorithm 4 Incremental Parsing Algorithm |
| Open Source Code | Yes | Our code is available at https://github.com/uiuc-focal-lab/syncode |
| Open Datasets | Yes | We consider JSON-Mode-Eval (Nous Research, 2024) dataset for text to JSON generation and Human Eval and MBXP (Athiwaratkun et al., 2023) dataset for evaluating Python and Go code generation. We display examples of prompts from these datasets in Appendix A.7. JSON-Mode-Eval (Nous Research, 2024). It consists of 100 zero-shot problems. Spider text-2-SQL. Spider (Yu et al., 2018) text-to-SQL dataset consists of 1,034 problems of varying difficulty levels: easy (250), medium (440), hard (174), and extra hard (170). Multilingual Human Eval (Athiwaratkun et al., 2023). It is an extension of the original Human Eval collection (Chen et al., 2021)... MBXP (Athiwaratkun et al., 2023). It is extended from the MBPP (Austin et al., 2021) dataset for Python to support other languages such as Go. |
| Dataset Splits | No | The paper lists dataset characteristics and sizes (e.g., Spider dataset difficulty levels with counts: easy (250), medium (440), hard (174), and extra hard (170)), and mentions generating 'n = 20 and n = 1 samples per problem' for code completion tasks. However, it does not explicitly provide information on how these datasets are split into training, validation, or test sets for the models evaluated, or specific percentages/counts for such splits. |
| Hardware Specification | Yes | We run experiments on a 48-core Intel Xeon Silver 4214R CPU with 2 NVidia RTX A5000 GPUs. |
| Software Dependencies | No | Syn Code is implemented using Py Torch (Paszke et al., 2019a), Hugging Face transformers library (Wolf et al., 2020) and Lark library (Lark, ). While these libraries are mentioned, specific version numbers for PyTorch, Hugging Face transformers, and Lark are not provided in the text. |
| Experiment Setup | Yes | We set max new tokens nmax = 400. Using greedy decoding and \n\n is used as an additional stopping condition for all experiments. We use the hyperparameters temperature = 0.2 and top p = 0.95. |