reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits

Authors: Ashish Khisti, MohammadReza Ebrahimi, Hassan Dbouk, Arash Behboodi, Roland Memisevic, Christos Louizos

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analysis also motives a new class of token-level selection schemes based on weighted importance sampling. Our experimental results demonstrate consistent improvements in the achievable block efficiency and token rates over baseline schemes in a number of scenarios.
Researcher Affiliation	Collaboration	Ashish Khisti12 M.Reza Ebrahimi1 Hassan Dbouk1 Arash Behboodi1 Roland Memisevic1 Christos Louizos1 1Qualcomm AI Research 2University of Toronto Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Pseudocode	Yes	Algorithm 1 Speculative Sampling Algorithm 2 Truncated LP
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We use the OPT models (Zhang et al., 2022), where the draft model has 125 million parameters and the target model has 13B parameters. For evaluation purposes we consider the datasets associated with the XSum (Narayan et al., 2018), Databricks-Dolly-15k (Conover et al., 2023) and the WMT18 (Bojar et al., 2018) tasks.
Dataset Splits	Yes	For evaluation purposes we consider the datasets associated with the XSum (Narayan et al., 2018), Databricks-Dolly-15k (Conover et al., 2023) and the WMT18 (Bojar et al., 2018) tasks.
Hardware Specification	Yes	We conduct experiments using an instance of A100 GPU with 80GB memory.
Software Dependencies	No	The paper mentions software like 'OPT models' but does not specify any software libraries or frameworks with their version numbers.
Experiment Setup	Yes	We set the temperature of the target model to 1.0, and one that of the draft models to 1.2 while we vary the temperature of the other draft model between the range of 1.0 to 2.4. In all our experiments we generate 5 tokens per call of the draft model. In the IS scheme we employ both truncated LP (with s = 5 as the truncation parameter) and truncated alphabet (to a size of 40 tokens) as discussed in section 4.