Enhancing SQL Query Generation with Neurosymbolic Reasoning

Authors: Henrijs Princis, Cristina David, Alan Mycroft

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Xander, our open-source implementation, show it both reduces runtime and increases accuracy of the generated SQL. A specific result is an LM using Xander outperforming a four-times-larger LM. ... The results of this experiment can be seen in Table 1.
Researcher Affiliation Academia Henrijs Princis1, Cristina David2, Alan Mycroft1 1University of Cambridge, Cambridge CB3 0FD, UK 2University of Bristol, Bristol BS8 1QU, UK EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: SQL Query Generation with Xander
Open Source Code Yes Code https://github.com/henrijsprincis/Xander
Open Datasets Yes Spider dataset https://huggingface.co/datasets/xlangai/spider
Dataset Splits Yes We used the Spider (Yu et al. 2018) dataset, which is the most challenging benchmark for the crossdomain and multi-table text-to-SQL. The training set was used to finetune the network and the validation set was used to measure real-world accuracy and runtime. ... After removing those, we have 6779 queries in the training dataset, and 1018 in the validation dataset (compared to 7000 and 1034, respectively, in the original Spider dataset).
Hardware Specification Yes Except those for Microsoft Phi-1.5, experiments were performed using Tesla P100 GPU and Intel Xeon 6142 CPU. For Microsoft Phi-1.5, due to the larger model size, we used an Amazon EC2 G5.xlarge instance with an A10G (24GB) GPU.
Software Dependencies No Experiments used Python with the Hugging Face transformers library (Wolf et al. 2020).
Experiment Setup Yes All networks except Microsoft Phi-1.5 were fitted for 50 epochs with batch size of 10. The Adam (Kingma and Ba 2017) optimiser with a learning rate of 4e 5 was used to find the optimal weights. For Microsoft Phi-1.5, to save memory, batchsize of 1 was used and RMSProp optimiser was used instead of Adam. To account for the larger network size, the learning rate was reduced to 4e 6 and the network was fitted for 5 epochs.