Structure-Guided Large Language Models for Text-to-SQL Generation
Authors: Qinggang Zhang, Hao Chen, Junnan Dong, Shengyuan Chen, Feiran Huang, Xiao Huang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two benchmark datasets demonstrate that SGU-SQL consistently outperforms state-of-the-art text-to-SQL models. Experiments on two benchmarks verify that SGU-SQL outperforms state-of-the-art baselines, including 11 finetuning models, 7 structure learning models, and 14 in-context learning models. |
| Researcher Affiliation | Academia | 1Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China 2City University of Macau, Macau SAR, China 3College of Information Science and Technology, Jinan University, GZ, China. |
| Pseudocode | No | The paper describes the methodology using natural language, definitions, and mathematical formulations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions several open-source LLMs and baseline models but does not explicitly state that the code for the proposed SGU-SQL methodology is open-source, nor does it provide any specific repository links. |
| Open Datasets | Yes | Datasets We assess the performance of text-to-SQL models using two renowned datasets, Spider (Yu et al., 2019) and BIRD (Li et al., 2023c). |
| Dataset Splits | Yes | Spider, a cross-domain text-to-SQL dataset, comprises 8659 instances in the training split and 1034 instances in the development split, spanning across 200 databases. Each instance comprises a natural language question related to a specific database and its corresponding SQL query. For evaluation purposes, we utilize the Spider-dev development split since the test split has not been released. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using various Large Language Models (LLMs) and other text-to-SQL methods as baselines, but it does not specify any software dependencies with version numbers for its own implementation (e.g., Python, PyTorch, specific libraries and their versions). |
| Experiment Setup | No | The paper discusses various prompting strategies and compares them, and mentions fine-tuning methods like LoRA and QLoRA for backbone LLMs. However, it does not explicitly provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for its experiments in the main text. |