reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Survey on Large Language Model Acceleration based on KV Cache Management

Authors: Haoyang LI, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole HU, Wei Dong, Li Qing, Lei Chen

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This survey provides a comprehensive overview of KV cache management strategies for LLM acceleration, categorizing them into token-level, model-level, and system-level optimizations. By presenting detailed taxonomies and comparative analyses, this work aims to offer useful insights for researchers and practitioners to support the development of efficient and scalable KV cache management techniques, contributing to the practical deployment of LLMs in real-world applications. Additionally, the survey provides an overview of both text and multi-modal datasets and benchmarks used to evaluate these strategies.
Researcher Affiliation	Academia	1The Hong Kong Polytechnic University 2The Hong Kong University of Science and Technology 3Huazhong University of Science and Technology 4The Chinese University of Hong Kong 5Nanyang Technological University. Emails: EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL
Pseudocode	No	The paper contains mathematical equations and descriptive text for algorithms, but no clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, step-by-step instructions in a code-like format.
Open Source Code	No	The curated paper list for KV cache management is in: https://github.com/Tree AI-Lab/Awesome-KV-Cache-Management. The provided link is for a "curated paper list," not for open-source code implementing a methodology described in this survey paper. A survey paper typically reviews existing work and does not propose new computational methods requiring code release.
Open Datasets	Yes	We collect a lot of long-context datasets, such as Numeric Bench (Li et al., 2025)and Long Bench (Bai et al., 2023). We categorize these datasets into various tasks, including question answering, text summarization, text reasoning, text retrieval, text generation, and aggregation. LLa VA-Bench (Liu et al., 2023b) is structured around image-ground-truth textual description-question-answer triplets, segmented across COCO and In-The-Wild datasets.
Dataset Splits	No	This paper is a survey that reviews existing KV cache management strategies and benchmarks, rather than presenting new experimental results that would require specific dataset splits for reproduction.
Hardware Specification	No	The paper is a survey of existing research and does not describe experiments performed by its authors. Therefore, it does not specify the hardware used for its own experimental runs. Section 6.3 discusses "Hardware-aware Design" in general terms for LLM inference, not for the survey's own experimental setup.
Software Dependencies	No	The paper is a survey and does not present new experimental results requiring specific software dependencies for reproduction. It discusses various software frameworks and libraries in the context of reviewed works, but not for its own methodology.
Experiment Setup	No	As a survey paper, this work analyzes and categorizes existing research on KV cache management. It does not present new experimental results or detailed experimental setups, such as hyperparameters or system-level training settings, from its own research.