Accelerating Large Language Model Reasoning via Speculative Search

Authors: Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both the Qwen and Llama models demonstrate that Spec Search significantly outperforms state-of-the-art approaches, achieving up to 2.12 speedup with comparable reasoning quality.
Researcher Affiliation Collaboration 1Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China 2Noah s Ark Lab, Huawei Technologies 3College of Intelligence and Computing, Tianjin University. Correspondence to: Jie Wang <EMAIL>.
Pseudocode Yes The procedure is summarized in Algorithm 1. ... Furthermore, we further present the complete Spec Search algorithm, which is based on the beam search algorithm, as shown in Algorithm 2.
Open Source Code Yes Code is available at https://github.com/MIRALab-USTC/ LLMReasoning-Spec Search.
Open Datasets Yes We use two well-established mathematical problem datasets, GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021), to evaluate the acceleration performance of the proposed framework.
Dataset Splits No We randomly select 100 samples from both the GSM8K and MATH datasets for evaluation. ... For the ablation study... we select 50 mathematical problems from the MATH dataset as the test set for the ablation study. This indicates random sampling without specific seeds or methods to exactly reproduce the specific splits used.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or exact computing environments) are provided. The paper only mentions the models used: 'We use quantized Qwen2.5-72B-Instruct and Qwen2.5-7B-Instruct (Team, 2024) as large and small models, respectively, along with quantized Llama3-70B-Instruct and Llama3-8B-Instruct (Dubey et al., 2024).'
Software Dependencies No The paper mentions using the 'v LLM (Kwon et al., 2023) package' and 'Open R' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Unless stated otherwise, experiments follow Open R (Wang et al., 2024a) settings: tree width of 6, tree depth of 50, MATH-psa as the process reward model (PRM), Qwen models as the main LLMs, and beam search as the main search algorithm. Throughout all experiments, we set the EMA weight θ in Spec Search to 0.9