Improving Consistency Identification in Task-oriented Dialogue Through Multi-Agent Collaboration

Authors: Peng Wang, Shuo Li, Ruoxi Zhou, Qiguang Chen, Xiao Xu, Hao Fei, Dagang Li, Wanxiang Che, Libo Qin

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the standard benchmark reveal that our framework achieves superior performance. Additionally, we compare MAC-CITo D with the most advanced trained approaches and find that its zero-shot performance on most metrics even surpasses that of models after training on the CI-To D dataset.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Central South University, China 2 Key Laboratory of Data Intelligence and Advanced Computing in Provincial Universities, Soochow University, China 3Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China 4School of Computing, National University of Singapore, Singapore 5School of Computer Science and Engineering, Macau University of Science and Technology, China EMAIL, EMAIL
Pseudocode No The paper describes the model architecture and collaboration paradigms using mathematical formulas and descriptive text, but no distinct pseudocode or algorithm blocks are provided.
Open Source Code Yes To facilitate the further research, our code will be available at https://github.com/WPENGxs/MAC-CITo D.
Open Datasets Yes Following previous work [Qin et al., 2021; Qin et al., 2022; Ding et al., 2024], we use the standard CI-To D benchmark for experiments.
Dataset Splits No The paper mentions using the "standard CI-To D benchmark for experiments" by citing previous work [Qin et al., 2021; Qin et al., 2022; Ding et al., 2024]. However, it does not explicitly provide specific dataset split information (percentages, sample counts, or explicit instructions for reproducing the data partitioning) within its main text.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions the use of various LLM backbones and general computing resources like the "High Performance Computing Center of Central South University" in the acknowledgments, without further specification.
Software Dependencies No The paper mentions that "All open source models are obtained from Hugging Face Library [Wolf et al., 2020]". While it names a library, it does not specify a version number for Hugging Face or any other critical software dependencies like Python, PyTorch, or TensorFlow, which are necessary for replication.
Experiment Setup Yes For the GPT series models, the temperature is 0.3, the top p is 1, and the output max token length is 512. For the open source model, the temperature is 0.7, the top p is 0.8, and the output max token length is 512.