reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Trigger3:Refining Query Correction via Adaptive Model Selector

Authors: Kepu Zhang, Zhongxiang Sun, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, Jun Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the effectiveness and efficiency and of the proposed Trigger3 framework, we conduct experiments on two query correction datasets, using three small models and two LLMs. The results consistently demonstrate that Trigger3 achieves optimal performance and high efficiency.
Researcher Affiliation	Collaboration	Kepu Zhang,1 Zhongxiang Sun,1 Xiao Zhang,1,* Xiaoxue Zang,2 Kai Zheng,2 Yang Song,2 Jun Xu1 1 Gaoling School of Artificial Intelligence, Renmin University of China 2 Kuaishou Technology Co., Ltd. EMAIL EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Process flow of Trigger3. 1 Input: Original query x and Trigger3 s models. 2 Output: Final corrected query yfinal.
Open Source Code	Yes	The source code, datasets, more experimental results and details can be found in the following link: Code https://github.com/ke-01/Trigger3.
Open Datasets	Yes	QQ is a publicly available search-related dataset, due to the lack of publicly available query correction datasets, we modify it as a query correction dataset. Following (Ye et al. 2023), we first use a language model to filter the queries, selecting those with a high probability of being correct. We then perform similar operations like Commercial dataset on these queries to construct a query correction dataset. The source code, datasets, more experimental results and details can be found in the following link: Code https://github.com/ke-01/Trigger3.
Dataset Splits	Yes	Table 1: Statistics of the used query correction datasets. Avg len #Query Error Rate Commercial 9.43 1,444,213 97.8% QQ 9.81 111,703 79.1% Valid Avg len #Query Error Rate Commercial 9.41 14,737 97.8% QQ 9.78 12,412 75.1% Test Avg len #Query Error Rate Commercial 9.43 14,737 97.8% QQ 9.79 13,791 74.7%
Hardware Specification	Yes	All experiments are performed on NVIDIA V100 32GB GPUs.
Software Dependencies	No	Our code implementation is based on Huggingface Transformers (Wolf et al. 2020) in Pytorch. No specific version numbers for Huggingface Transformers or Pytorch are provided.
Experiment Setup	Yes	We utilize the Adam (Kingma and Ba 2014) optimizer, setting the initial learning rate to 5e-5, the batch size to 16, and applying a cosine learning rate schedule for 3 epochs.