reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TrustUQA: A Trustful Framework for Unified Structured Data Question Answering

Authors: Wen Zhang, Long Jin, Yushan Zhu, Jiaoyan Chen, Zhiwei Huang, Junjie Wang, Yin Hua, Lei Liang, Huajun Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have evaluated Trust UQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods. In comparison with the baselines that are specific to one data type, it achieves state-of-the-art on 2 of the datasets. Further more, we have demonstrated the potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data.
Researcher Affiliation	Collaboration	1Zhejiang University 2University of Manchester 3Ant Group 4ZJU-Ant Group Joint Lab of Knowledge Graph 5Zhejiang Key Laboratory of Big Data Intelligent Computing
Pseudocode	No	The paper describes query functions like 'get_information', 'search_node', and 'search_condition' and their translation rules in Table 1, but these are presented as textual descriptions and a mapping table, not as a structured pseudocode block or algorithm.
Open Source Code	Yes	Code https://github.com/zjukg/Trust UQA
Open Datasets	Yes	We adopt 5 datasets covering 3 data types: Wiki SQL (2017) and WTQ (2015) for table, Web Questions SP(Web QSP) (2016) and Meta QA (2018) for KG, and Cron Questions (2021) for temporal KG.
Dataset Splits	No	The paper mentions using well-known datasets such as Wiki SQL, WTQ, Web QSP, Meta QA, and Cron Questions, and mentions constructing demonstrations. However, it does not explicitly specify the training, validation, or test splits for these datasets within the provided text, only referring to them as 'official' or 'processed versions' without detailing the split methodology or sizes for the main evaluation.
Hardware Specification	Yes	Our system is equipped with 2*NVIDIA A100 PCIe 40GB GPUs, 40 physical cores across 2 sockets, each socket containing 20 cores. The Intel Xeon Gold 6148 processors operate at a base speed of 2.40 GHz, with a maximum of 3.70 GHz.
Software Dependencies	Yes	We use GPT-3.5 (gpt-3.5-turbo-0613) as the LLM with self-consistency strategy of 5 times, and Sentence BERT (2019) as the dense text encoder.
Experiment Setup	Yes	We use GPT-3.5 (gpt-3.5-turbo-0613) as the LLM with self-consistency strategy of 5 times, and Sentence BERT (2019) as the dense text encoder. If the answer is None due to mismatched entity-relation pairs and key-value inconsistencies etc., we implement the retry mechanism with 3 times trials. We set the number of retrieves m = 15 and the number of demonstrations k = 8.