Accelerating Large Language Model Reasoning via Speculative Search
Authors: Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both the Qwen and Llama models demonstrate that Spec Search significantly outperforms state-of-the-art approaches, achieving up to 2.12 speedup with comparable reasoning quality. |
| Researcher Affiliation | Collaboration | 1Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China 2Noah s Ark Lab, Huawei Technologies 3College of Intelligence and Computing, Tianjin University. Correspondence to: Jie Wang <EMAIL>. |
| Pseudocode | Yes | The procedure is summarized in Algorithm 1. ... Furthermore, we further present the complete Spec Search algorithm, which is based on the beam search algorithm, as shown in Algorithm 2. |
| Open Source Code | Yes | Code is available at https://github.com/MIRALab-USTC/ LLMReasoning-Spec Search. |
| Open Datasets | Yes | We use two well-established mathematical problem datasets, GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021), to evaluate the acceleration performance of the proposed framework. |
| Dataset Splits | No | We randomly select 100 samples from both the GSM8K and MATH datasets for evaluation. ... For the ablation study... we select 50 mathematical problems from the MATH dataset as the test set for the ablation study. This indicates random sampling without specific seeds or methods to exactly reproduce the specific splits used. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or exact computing environments) are provided. The paper only mentions the models used: 'We use quantized Qwen2.5-72B-Instruct and Qwen2.5-7B-Instruct (Team, 2024) as large and small models, respectively, along with quantized Llama3-70B-Instruct and Llama3-8B-Instruct (Dubey et al., 2024).' |
| Software Dependencies | No | The paper mentions using the 'v LLM (Kwon et al., 2023) package' and 'Open R' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Unless stated otherwise, experiments follow Open R (Wang et al., 2024a) settings: tree width of 6, tree depth of 50, MATH-psa as the process reward model (PRM), Qwen models as the main LLMs, and beam search as the main search algorithm. Throughout all experiments, we set the EMA weight θ in Spec Search to 0.9 |