CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
Authors: Yuanheng Fang, Guoqing Chao, Wenqiang Lei, Shaobo Li, Dianhui Chu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on six datasets show the superiority of our proposed method CDWCo T over the state-of-the-art methods. The main contributions of our work are summarized as follows: Our empirical evaluations confirm that the CDW-Co T framework substantially outperforms traditional Co T methods, achieving the state-of-the-art accuracy across multiple datasets. |
| Researcher Affiliation | Academia | 1Harbin Institute of Technology, Weihai, 264209, Shandong, China 2Sichuan University, Chengdu, 610065, Sichuan, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Cluster-Based Prompt Candidate Pool Initialization Algorithm 2: Distance-Weighted Prompt Selection and Inference |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | Yes | Commonsense Reasoning: Commonsense QA (CSQA) (Talmor et al. 2018): A widely used dataset for evaluating commonsense reasoning through multiple-choice questions that require inferencing based on prior knowledge and context. Strategy QA (Geva et al. 2021): It contains questions requiring implicit multi-hop reasoning to derive yes/no answers, testing the model s ability to connect various pieces of information logically. Symbolic Reasoning: Letter (Wei et al. 2022): It involves tasks like last letter concatenation, designed to test the symbolic reasoning capabilities of models. Coin (Wei et al. 2022): It focuses on determining the state of a coin after a series of flips, evaluating the model s ability to track state changes through symbolic manipulations. Mathematical Reasoning: Multi Arith (Roy and Roth 2016): It consists of multistep arithmetic word problems that require a sequence of operations to reach the solution, testing multi-step reasoning in arithmetic contexts. AQu A (Ling et al. 2017): It includes complex arithmetic word problems with multiple-choice answers, providing a benchmark for evaluating sophisticated reasoning and calculation skills. |
| Dataset Splits | Yes | Datasets were divided into training, evaluation, and test subsets with proportions of approximately 60%, 25%, and 15%, respectively (Wang et al. 2022b). After dividing the data, we identified the number of clusters according to the Auto-Co T setup, and then adjusted the number of clusters for certain datasets from the default 8 to 3, as shown in Table 2. Table 2: Data Split and Number of Clusters Statistics. |
| Hardware Specification | Yes | We conducted comparative experiments using both the LLa MA2 (13B) and LLa MA3 (8B) models, running on two NVIDIA 4090 GPUs locally. |
| Software Dependencies | No | The paper mentions using LLaMA2 (13B) and LLaMA3 (8B) models but does not provide specific version numbers for any other software dependencies such as programming languages, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | Pool Size: We maintained a consistent pool of 40 potential prompts for each dataset to enable thorough exploration of diverse reasoning pathways. Sample Size: During training, each instance was tested against five unique prompt combinations, assessing the effectiveness of various configurations. Temperature: A temperature of 0.3 was used to optimize prompt selection during testing. |