MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL
Authors: Arian Askari, Christian Poelitz, Xinye Tang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments show that MAGIC s guideline outperforms expert human s created ones. We empirically find out that the guideline produced by MAGIC enhances the interpretability of the corrections made, providing insights in analyzing the reason behind the failures and successes of LLMs in self-correction. |
| Researcher Affiliation | Collaboration | 1Leiden University 2Microsoft Research Cambridge, UK, 3Microsoft Redmond EMAIL, EMAIL |
| Pseudocode | No | The paper includes block diagrams (Figure 2) and structured prompt templates (Figures 3, 4, 5, 6) but not formal pseudocode or algorithm blocks describing the method's steps. |
| Open Source Code | Yes | We publish all code to reproduce our experiments as open source.1 1https://github.com/microsoft/SynQo |
| Open Datasets | Yes | Datasets https://huggingface.co/datasets/microsoft/MAGIC The Spider (Yu et al. 2018) dataset... The BIRD dataset (Li et al. 2023)... |
| Dataset Splits | Yes | The Spider (Yu et al. 2018) dataset is a collection of 10,181 questions and 5,693 unique complex SQL queries across 200 databases in 138 domains, with each domain featuring multiple tables. It is divided into training, development, and test sets with 8,659, 1,034, and 2,147 examples, respectively, across 146, 20, and 34 distinct databases, ensuring no overlap between sets. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions models and frameworks like 'GPT-4' and 'DIN-SQL', but does not provide specific ancillary software dependencies with version numbers (e.g., Python, PyTorch, CUDA, or specific library versions). |
| Experiment Setup | Yes | We set 5 as maximum number of iteration. We determined that a feedback batch size of 10 is optimal. For self-consistency (Wang et al. 2022; Gao et al. 2023), we generate 20 SQL queries... For the Multiple-Prompt baseline, we follow the approach in (Lee et al. 2024) by reordering candidate tables in the prompt and generating up to 20 different combinations... |