reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

Authors: Tong Wu, Junzhe Shen, Zixia Jia, Yuxuan Wang, Zilong Zheng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that TOKENSWIFT achieves over 3ˆ speedup across models of varying scales (1.5B, 7B, 8B, 14B) and architectures (MHA, GQA). This acceleration translates to hours of time savings for ultra-long sequence generation, establishing TOKENSWIFT as a scalable and effective solution at unprecedented lengths.
Researcher Affiliation	Collaboration	1State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China 2LUMIA Lab, Shanghai Jiao Tong University. Correspondence to: Zilong Zheng <EMAIL>. The affiliations include a 'State Key Laboratory' and 'BIGAI', along with 'Shanghai Jiao Tong University', which is a known academic institution. This combination of a state-funded research lab/entity and a university indicates a collaborative affiliation.
Pseudocode	Yes	In summary, the overall flow of our framework is presented in Algorithm 1.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the code for the methodology described in this paper is open-source or publicly available.
Open Datasets	Yes	The inference experiments are performed on the test set of PG-19 (Rae et al., 2020). ... We train the model on Wikipedia (20231101.en) 5 and part of C4-en6 for 1 epoch. 5https://huggingface.co/datasets/wikimedia/wikipedia 6https://huggingface.co/datasets/allenai/c4
Dataset Splits	No	The inference experiments are performed on the test set of PG-19 (Rae et al., 2020). ... We train linear layers in Section 3.2 using the first 8K tokens of training data, for datasets longer than 8K tokens, from PG-19 (Rae et al., 2020). While the paper mentions using a 'test set' and 'training data' from PG-19, it does not specify the exact split percentages, sample counts, or the methodology (e.g., random seed, stratified split) used to create these splits for reproducibility.
Hardware Specification	Yes	Inference is performed on a single NVIDIA A100-SXM4-80GB. ... The model was trained on an NVIDIA A100-SXM4-80GB GPU.
Software Dependencies	No	optimizer Adam W (Table 10). While an optimizer is mentioned, no version numbers for this or any other software libraries or frameworks are provided.
Experiment Setup	Yes	The number of extra decoding heads is set to 3 across all models. ... Table 10. Additional training details. ... optimizer Adam W betas (0.9, 0.999) weight decay 0.1 warmup steps 50 learning rate scheduler cosine num. GPUs 4 gradient accumulation steps 10 ... Table 11. k stands for the maximum number of retrieved n-grams in token reutilization k temp. top-p min-p penalty penalty len. LLaMA3.1-8b 20 1.0 0.9 0.05 1.2 1024