C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Authors: Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments over four datasets from arithmetic and commonsense scenarios, showing that the proposed method is capable of compressing the length of generated Co T by up to more than 50% without compromising its effectiveness. Additionally, we design extensive experiments and discussions to analyze the contribution of different components in our approach, as well as to explore future research directions of Co T compression based on our method.
Researcher Affiliation Industry Yu Kang, Xianghui Sun, Liangyu Chen *, Wei Zou Beike Inc., Beijing, China EMAIL
Pseudocode No The paper describes the C3o T framework and its components (Compressor, Conditioned Training, Conditioned Inference) in narrative form, supplemented by a diagram in Figure 1, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the described methodology.
Open Datasets Yes For math reasoning, we use GSM8K (Cobbe et al. 2021) and Math QA (Amini et al. 2019). As for commonsense reasoning, we use ECQA (Aggarwal et al. 2021) and Strategy QA (Geva et al. 2021).
Dataset Splits Yes We followed the training and testing set division as outlined in the original paper of the dataset used, trained C3o T on the training set, and evaluated its performance on the test set, excluding Strategy QA. Due to the inaccessibility of ground truths for the Strategy QA test set, we proceeded to further split the original Strategy QA training set into training and test sets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using the Adam W optimizer and LLa MA-2-Chat models, but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes In this paper, we train C3o T based on LLa MA-2-Chat-7B and -13B (Touvron et al. 2023). We fine-tune the model for 2 epochs on each dataset using the Adam W optimizer with a sequence length of 2,048 tokens and a batch size of 128. The Adam W optimizer s hyperparameters are set as follows: β1 = 0.9, β2 = 0.999, ϵ = 10 6, and weight decay of 0.001. We employ a cosine learning rate schedule with a maximum learning rate of 1 10 5.