CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Authors: Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, Sercan Arik

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present comprehensive evaluations on the efficacy of proposed methodologies of CHASE-SQL. Our innovative candidate generation approaches demonstrate superior performance compared to traditional generic Co T prompts, illustrating their capability in guiding LLMs through the decomposition of complex problems into manageable intermediate steps. Furthermore, the proposed selection agent significantly outperforms conventional consistency-based methods, contributing to the stateof-the-art results. Specifically, CHASE-SQL reaches an execution accuracy of 73.01% and 73.0% on the development set and test set of the challenging BIRD Text-to-SQL dataset which outperforms all of the published and undisclosed methods on this benchmark, by a large margin.
Researcher Affiliation Collaboration 1Google Cloud, Sunnyvale, CA, USA 2Stanford University, Stanford, CA, USA
Pseudocode Yes Algorithm 1 Divide and Conquer Chain-of-Thought (Co T) Strategy for Text-to-SQL. Algorithm 2 Online Synthetic example generation strategy for Text-to-SQL. Algorithm 3 Picking the final SQL query from a pool of candidates. Algorithm 4 Query fixing method.
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described in this paper.
Open Datasets Yes We evaluate the performance of the proposed CHASE-SQL framework on two widely-recognized cross-domain datasets: BIRD (Li et al., 2024c) and Spider (Yu et al., 2018).
Dataset Splits Yes The Spider dataset is divided into non-overlapping training, development, and test sets similar to BIRD.
Hardware Specification No The paper mentions using Gemini and Claude models and training a Gemini 1.5 Flash model using Vertex AI tuning API, but does not provide specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies Yes Moreover, by leveraging entirely open-source models Mistral Large Model (AI, 2024) as the candidate generator and a fine-tuned Qwen-2.5-coder model (Team, 2024) as the selector our method achieved a state-of-the-art performance of 70.33 on the BIRD development set with open-source models.
Experiment Setup Yes The Gemini 1.5 Flash model is trained for 10 epochs using a Lo RA adapter with a rank of 16 using Vertex AI tuning API.