reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation

Authors: Hongxuan Zhang, Yao Zhao, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate that CSR matches the performance of state-of-the-art KV cache quantization algorithms while ensuring robust functionality in memory-constrained environments.
Researcher Affiliation	Collaboration	Hongxuan Zhang1, 2* , Yao Zhao2 , Jiaqi Zheng1, Chenyi Zhuang2, Jinjie Gu2, Guihai Chen1 1Nanjing University 2Ant Group x EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Neural Dict
Open Source Code	No	The text does not explicitly state that source code for the methodology is openly provided, nor does it provide a direct link to a code repository. Footnotes point to Hugging Face Transformers documentation (a third-party tool) and an arXiv preprint of their extended version, neither of which is a code repository for their specific implementation.
Open Datasets	Yes	We extracted a range of prompts from wikitext dataset(Merity et al. 2016)... We utilized the Long Bench benchmark (Bai et al. 2023), which is a bilingual and multitask benchmark designed to assess the long context understanding capabilities of LLM.
Dataset Splits	No	The paper mentions using a 'calibration corpus dataset' and a 'test dataset' for Neural Dict training and evaluation, and the 'Long Bench benchmark' for model evaluation. However, it does not specify the exact split percentages, sample counts, or the methodology used to create these splits for any of the datasets.
Hardware Specification	Yes	A single NVIDIA A100 GPU (80GB) with 128GB memory.
Software Dependencies	No	The paper states that LLMs are based on the 'Hugging Face Transformers library' but does not provide any specific version numbers for this library or any other software dependencies like Python or PyTorch.
Experiment Setup	Yes	In the experiments, unless stated otherwise, the Value Cache uses sn = 2, and the Key Cache uses sn = 1. For simplicity, CSR-s denotes the MP-level. In CSR s online part, the Guard size per layer is 8192, and the sampling size is 4096 for Llama2 and Baichuan2. For Llama3, the Guard size is 2048, with a sampling size of 1024.