reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

polybasic Speculative Decoding Through a Theoretical Perspective

Authors: Ruilin Wang, Huixia Li, Yuexiao Ma, Xiawu Zheng, Fei Chao, Xuefeng Xiao, Rongrong Ji

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across multiple model families demonstrate that our approach yields speedup ratios ranging from 3.31 to 4.01 for LLa MA2-Chat 7B, up to 3.87 for LLa MA3-8B, up to 4.43 for Vicuna7B and up to 3.85 for Qwen2-7B all while preserving the original output distribution. We release our theoretical proofs and implementation code to facilitate further investigation into polybasic speculative decoding.
Researcher Affiliation	Collaboration	1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China 2Byte Dance 3Institute of Artificial Intelligence, Xiamen University 4Peng Cheng Laboratory, Shenzhen, China.
Pseudocode	Yes	Algorithm 1 Polybasic Speculative Decoding (Three Models)
Open Source Code	Yes	We release our theoretical proofs and implementation code to facilitate further investigation into polybasic speculative decoding.
Open Datasets	Yes	We evaluated our multi-model speculative system in Spec Bench(Xia et al., 2024), across multiple tasks including multi-turn conversation, translation, summarization, question answering, mathematical reasoning, and retrieval-augmented generation, employing the MT-bench (Zheng et al., 2023), WMT14 DE-EN, CNN/Daily Mail (Nallapati et al., 2016), Natural Questions (Kwiatkowski et al., 2019), GSM8K (Cobbe et al., 2021), and DPR (Karpukhin et al., 2020).
Dataset Splits	Yes	We evaluated our multi-model speculative system in Spec Bench(Xia et al., 2024), across multiple tasks including multi-turn conversation, translation, summarization, question answering, mathematical reasoning, and retrieval-augmented generation, employing the MT-bench (Zheng et al., 2023), WMT14 DE-EN, CNN/Daily Mail (Nallapati et al., 2016), Natural Questions (Kwiatkowski et al., 2019), GSM8K (Cobbe et al., 2021), and DPR (Karpukhin et al., 2020).
Hardware Specification	Yes	Our experiments run on NVIDIA A800 80G GPUs.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Speculative sampling (Leviathan et al., 2023) conducted experiments with a batch size of 1, similarly, the majority of our experiments also adopted this setting. For the intermediate model, we adopt 4-bit quantization (Ma et al., 2024) with a group size of 128, balancing reduced inference cost against quality. Draft models are built following EAGLE2, trained on Share GPT data.