reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

Authors: Jipeng Cen, Jiaxin Liu, Zhixu Li, Jingjing Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our proposed framework on five Text-to-SQL benchmarks. The experimental results show that our method consistently enhances the performance of the baseline model, specifically achieving an execution accuracy improvement of over 3% on the Bird benchmark.
Researcher Affiliation	Collaboration	Jipeng Cen1, Jiaxin Liu4, Zhixu Li2,3, Jingjing Wang1* 1School of Computer Science & Technology, Soochow University, Suzhou, China 2School of Information, Renmin University of China, Beijing, China 3International College (Suzhou Research Institute), Renmin University of China, Suzhou, China 4i FLYTEK Research (Suzhou), China
Pseudocode	Yes	Algorithm 1: The algorithm of SQLFix Agent
Open Source Code	Yes	To facilitate the related research, all codes will be released via Github.
Open Datasets	Yes	We evaluated our framework on two primary Text-to-SQL benchmarks: Spider (Yu et al. 2018) and Bird (Li et al. 2024b).
Dataset Splits	Yes	Spider offers a training set comprising 8,659 samples, a development set with 1,034 samples, and a test set with 2,147 samples, encompassing 200 distinct databases and 138 domains.
Hardware Specification	Yes	All experiments were conducted on a server equipped with 1 AMD EPYC 7352 CPU and 8 NVIDIA RTX 3090 GPU.
Software Dependencies	No	In our experiments, we use the fine-tuned Codes (Li et al. 2024a) as the SQLTool used by agents for Text-to-SQL parsing and employ GPT-3.5-turbo as the backbone LLM for three agents. The paper mentions software names but does not provide specific version numbers for them.
Experiment Setup	Yes	For SQLTool inference, a beam search produces 4 SQL candidates, we select the first executable one for further checking by SQLFix Agent. If the error is detected, SQLFix Agent attempts to repair it up to 3 times. These hyperparameters are tuned on validation set.