CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation
Authors: Hongxuan Zhang, Yao Zhao, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate that CSR matches the performance of state-of-the-art KV cache quantization algorithms while ensuring robust functionality in memory-constrained environments. |
| Researcher Affiliation | Collaboration | Hongxuan Zhang1, 2* , Yao Zhao2 , Jiaqi Zheng1, Chenyi Zhuang2, Jinjie Gu2, Guihai Chen1 1Nanjing University 2Ant Group x EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Neural Dict |
| Open Source Code | No | The text does not explicitly state that source code for the methodology is openly provided, nor does it provide a direct link to a code repository. Footnotes point to Hugging Face Transformers documentation (a third-party tool) and an arXiv preprint of their extended version, neither of which is a code repository for their specific implementation. |
| Open Datasets | Yes | We extracted a range of prompts from wikitext dataset(Merity et al. 2016)... We utilized the Long Bench benchmark (Bai et al. 2023), which is a bilingual and multitask benchmark designed to assess the long context understanding capabilities of LLM. |
| Dataset Splits | No | The paper mentions using a 'calibration corpus dataset' and a 'test dataset' for Neural Dict training and evaluation, and the 'Long Bench benchmark' for model evaluation. However, it does not specify the exact split percentages, sample counts, or the methodology used to create these splits for any of the datasets. |
| Hardware Specification | Yes | A single NVIDIA A100 GPU (80GB) with 128GB memory. |
| Software Dependencies | No | The paper states that LLMs are based on the 'Hugging Face Transformers library' but does not provide any specific version numbers for this library or any other software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | In the experiments, unless stated otherwise, the Value Cache uses sn = 2, and the Key Cache uses sn = 1. For simplicity, CSR-s denotes the MP-level. In CSR s online part, the Guard size per layer is 8192, and the sampling size is 4096 for Llama2 and Baichuan2. For Llama3, the Guard size is 2048, with a sampling size of 1024. |