reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting

Authors: Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Duyu Tang, Kai Han, Yunhe Wang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple benchmarks demonstrate our effectiveness, where Kangaroo achieves walltime speedups up to 2.04 , outperforming Medusa-1 with 88.7% fewer additional parameters.
Researcher Affiliation	Industry	Huawei Noah s Ark Lab Consumer Business Group, Huawei EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for Kangaroo is available at https://github.com/Equationliu/Kangaroo.
Open Datasets	Yes	We conduct experiments on Vicuna [12] models with size of 7B and 13B. ... For Kangaroo, we train the adapter A for 10 epochs with the Adam W [42] optimizer on the Share GPT dataset following Medusa [20].
Dataset Splits	Yes	We evaluate the acceleration performance with the recently proposed Spec-Bench [22], which consists of six subtasks including Multi-turn Conversation, Translation, Summarization, Question Answering, Mathematical Reasoning and Retrieval-augmented Generation.
Hardware Specification	Yes	The training of the adapter A for Vicuna-7B takes around 24 hours on 8 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using the Adam W [42] optimizer but does not provide specific version numbers for other software dependencies like programming languages or libraries.
Experiment Setup	Yes	For Kangaroo, we train the adapter A for 10 epochs with the Adam W [42] optimizer on the Share GPT dataset following Medusa [20]. ... During the inference stage, we set ℓ= 2 for Vicuna-7B and ℓ= 3 for Vicuna-13B. For the single-sequence decoding in Kangaroo, we set γ = 6 and η = 0.6. For the dynamic tree decoding scenario, we set Top-K as 10, and η = 0.4.