Improving Consistency Identification in Task-oriented Dialogue Through Multi-Agent Collaboration
Authors: Peng Wang, Shuo Li, Ruoxi Zhou, Qiguang Chen, Xiao Xu, Hao Fei, Dagang Li, Wanxiang Che, Libo Qin
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the standard benchmark reveal that our framework achieves superior performance. Additionally, we compare MAC-CITo D with the most advanced trained approaches and find that its zero-shot performance on most metrics even surpasses that of models after training on the CI-To D dataset. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Central South University, China 2 Key Laboratory of Data Intelligence and Advanced Computing in Provincial Universities, Soochow University, China 3Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, China 4School of Computing, National University of Singapore, Singapore 5School of Computer Science and Engineering, Macau University of Science and Technology, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the model architecture and collaboration paradigms using mathematical formulas and descriptive text, but no distinct pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | To facilitate the further research, our code will be available at https://github.com/WPENGxs/MAC-CITo D. |
| Open Datasets | Yes | Following previous work [Qin et al., 2021; Qin et al., 2022; Ding et al., 2024], we use the standard CI-To D benchmark for experiments. |
| Dataset Splits | No | The paper mentions using the "standard CI-To D benchmark for experiments" by citing previous work [Qin et al., 2021; Qin et al., 2022; Ding et al., 2024]. However, it does not explicitly provide specific dataset split information (percentages, sample counts, or explicit instructions for reproducing the data partitioning) within its main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions the use of various LLM backbones and general computing resources like the "High Performance Computing Center of Central South University" in the acknowledgments, without further specification. |
| Software Dependencies | No | The paper mentions that "All open source models are obtained from Hugging Face Library [Wolf et al., 2020]". While it names a library, it does not specify a version number for Hugging Face or any other critical software dependencies like Python, PyTorch, or TensorFlow, which are necessary for replication. |
| Experiment Setup | Yes | For the GPT series models, the temperature is 0.3, the top p is 1, and the output max token length is 512. For the open source model, the temperature is 0.7, the top p is 0.8, and the output max token length is 512. |