Earley-Driven Dynamic Pruning for Efficient Structured Decoding
Authors: Xintong Sun, Chi Wei, Minghao Tian, Shiwen Ni
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments on structured generation tasks, including JSON generation, JSON Schema validation, and semantic parsing, we demonstrate that Formatron not only consistently maintains high-precision compliant outputs but also achieves significant improvements in inference speed up to 2x compared to state-of-the-art implementations. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Rice University, Texas, the United States 2Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. Correspondence to: Shiwen Ni <EMAIL>. |
| Pseudocode | No | The paper describes algorithms conceptually, such as the Earley algorithm and its operations (Prediction, Scanning, Completion), but does not provide a clearly labeled pseudocode block or algorithm figure with structured steps. |
| Open Source Code | Yes | We release Formatron as open source at https://github.com/Dan-wanna M/formatron. |
| Open Datasets | Yes | Test Task. Geoquery (Davis & Meltzer, 2007) transformation converts natural language queries into Fun QL, adhering to fixed predicates and finite entity constraints. JSON Schema (Pezoa et al., 2016) generation produces JSON instances compliant with type, enumeration, and regular expression constraints. |
| Dataset Splits | No | The paper does not provide explicit training/test/validation dataset splits with percentages, sample counts, or specific file references for the Geoquery, JSON Schema, or JSON Grammar tasks. While it describes a data augmentation process for multiple runs in Appendix E, this is for generating test data variations, not defining standard model evaluation splits. |
| Hardware Specification | Yes | All experiments were conducted on a system equipped with an NVIDIA Ge Force RTX 3090 (24GB VRAM) and an AMD EPYC 7452 32-core processor. |
| Software Dependencies | Yes | The software environment consisted of Py Torch 2.4.0 and CUDA 12.4, with model inference performed using Transformers v4.48.0. Four pre-trained large language models were employed in this study: google/gemma-2-9b-it (Gemma Team & Shreya Pathak, 2024), meta-llama/Llama-3-8B-Instruct (Dubey et al., 2024), mistralai/Mistral-7B-Instruct-v0.3 (Jiang et al., 2023), and qwen/Qwen2.5-7B-Instruct (Yang et al., 2024), all utilizing half-precision (FP16) inference. For more details of Python libraries, see the appendix A. |
| Experiment Setup | Yes | Four pre-trained large language models were employed in this study: google/gemma-2-9b-it (Gemma Team & Shreya Pathak, 2024), meta-llama/Llama-3-8B-Instruct (Dubey et al., 2024), mistralai/Mistral-7B-Instruct-v0.3 (Jiang et al., 2023), and qwen/Qwen2.5-7B-Instruct (Yang et al., 2024), all utilizing half-precision (FP16) inference. |