reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Authors: Juno Kim, Denny Wu, Jason D. Lee, Taiji Suzuki

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Under this framework, we prove that implementing a search protocol that rewards sparse edges improves Co T by decreasing the expected number of steps to reach different clusters. In contrast, we establish a limit on reasoning capability when the model is restricted to local information of the pretrained graph. We also show that the information gained by search can be utilized to obtain a better reasoning model: (1) the pretrained model can be directly finetuned to favor sparse edges via policy gradient methods, and moreover (2) a compressed metastable representation of the reasoning dynamics can be distilled into a smaller, more efficient model. All proofs are deferred to the appendix. A discussion of additional related works is provided in Appendix A. Metastable dynamics and hitting times are studied in Appendices B-C, optimization dynamics are analyzed in Appendix D, and learning-theoretic lower bounds are given in Appendix E.
Researcher Affiliation	Academia	1University of Tokyo and RIKEN AIP 2New York University and Flatiron Institute 3Princeton University.
Pseudocode	Yes	Algorithm 1 Two-stage Pretraining, Algorithm 2 Sparse Edge Search, Algorithm 3 PPO-Clip, Algorithm 4 Meta-chain Distillation
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	The paper introduces a perturbed Markov chain model for Co T reasoning that differentiates between easy and hard reasoning steps through a dense-sparse structure. It does not describe any experiments conducted on specific open datasets.
Dataset Splits	No	The paper does not conduct experiments on datasets, therefore, it does not specify any dataset splits.
Hardware Specification	No	The paper focuses on theoretical analysis and mathematical proofs; therefore, no specific hardware used for experiments is mentioned.
Software Dependencies	No	The paper is a theoretical work and does not describe any experimental implementation or software dependencies with version numbers.
Experiment Setup	No	The paper focuses on theoretical modeling and proofs, not on experimental results. Therefore, it does not provide details on experimental setup, hyperparameters, or training configurations.