Structure-Guided Large Language Models for Text-to-SQL Generation

Authors: Qinggang Zhang, Hao Chen, Junnan Dong, Shengyuan Chen, Feiran Huang, Xiao Huang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two benchmark datasets demonstrate that SGU-SQL consistently outperforms state-of-the-art text-to-SQL models. Experiments on two benchmarks verify that SGU-SQL outperforms state-of-the-art baselines, including 11 finetuning models, 7 structure learning models, and 14 in-context learning models.
Researcher Affiliation Academia 1Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China 2City University of Macau, Macau SAR, China 3College of Information Science and Technology, Jinan University, GZ, China.
Pseudocode No The paper describes the methodology using natural language, definitions, and mathematical formulations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions several open-source LLMs and baseline models but does not explicitly state that the code for the proposed SGU-SQL methodology is open-source, nor does it provide any specific repository links.
Open Datasets Yes Datasets We assess the performance of text-to-SQL models using two renowned datasets, Spider (Yu et al., 2019) and BIRD (Li et al., 2023c).
Dataset Splits Yes Spider, a cross-domain text-to-SQL dataset, comprises 8659 instances in the training split and 1034 instances in the development split, spanning across 200 databases. Each instance comprises a natural language question related to a specific database and its corresponding SQL query. For evaluation purposes, we utilize the Spider-dev development split since the test split has not been released.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions using various Large Language Models (LLMs) and other text-to-SQL methods as baselines, but it does not specify any software dependencies with version numbers for its own implementation (e.g., Python, PyTorch, specific libraries and their versions).
Experiment Setup No The paper discusses various prompting strategies and compares them, and mentions fine-tuning methods like LoRA and QLoRA for backbone LLMs. However, it does not explicitly provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for its experiments in the main text.