reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries

Authors: Junhyuck Kim, Jongho Park, Jaewoong Cho, Dimitris Papailiopoulos

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on various models (Llama-3-8B, Llama-3.1-8B-Instruct, Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct, Mistral-7B-Instruct, Qwen2.5-14B-Instruct), using dictionaries trained on Wiki Text-103, as done in Section 2.3. To assess the effectiveness of Lexico in memory reduction while maintaining long-context understanding, we conduct experiments on selected tasks from Long Bench (Bai et al., 2023), following the setup of Liu et al. (2024b). See Table 7 in Appendix B for task details. Additionally, we evaluate generative performance on complex reasoning tasks, such as GSM8K (Cobbe et al., 2021) with 5-shot prompting and MMLU-Pro Engineering/Law (Wang et al., 2024a) with zero-shot chain-of-thought.
Researcher Affiliation	Collaboration	1KRAFTON 2University of Wisconsin-Madison 3Microsoft Research. Correspondence to: Dimitris Papailiopoulos <EMAIL>.
Pseudocode	Yes	Algorithm 1 illustrates a naive implementation of OMP for understanding. In Lexico, we adopt the implementation of OMP v0 proposed by (Zhu et al., 2020), which minimizes computational complexity using efficient inverse Cholesky factorization. Additionally, we integrate methods from (Lubonja et al., 2024) for batched GPU execution and extend the implementation to handle multiple dictionaries in parallel. Algorithm 1 OMP ... Algorithm 2 Prefilling and decoding with Lexico
Open Source Code	Yes	Our code is available at https: //github.com/krafton-ai/lexico.
Open Datasets	Yes	For our experiments, we train a dictionary on Wiki Text-103 (Merity, 2016) for each model. This dictionary is only trained once and used universally across all tasks. ... We conduct experiments on selected tasks from Long Bench (Bai et al., 2023)... Additionally, we evaluate generative performance on complex reasoning tasks, such as GSM8K (Cobbe et al., 2021) with 5-shot prompting and MMLU-Pro Engineering/Law (Wang et al., 2024a) with zero-shot chain-of-thought.
Dataset Splits	No	The paper mentions training dictionaries on Wiki Text-103 and evaluating on tasks like GSM8K with '5-shot prompting' and MMLU-Pro with 'zero-shot chain-of-thought', but it does not provide specific train/test/validation dataset splits (e.g., percentages, sample counts, or citations to predefined splits for partitioning the datasets themselves) needed to reproduce the data partitioning for these experiments or for dictionary training.
Hardware Specification	Yes	Table 1 summarises training time for Llama-3.1-8B-Instruct on a single NVIDIA A100 at different sparsity s and dictionary size N.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer and refers to academic papers for implementation details of OMP (Zhu et al., 2020; Lubonja et al., 2024), but it does not specify any software libraries or frameworks with their exact version numbers (e.g., Python 3.x, PyTorch 1.x) that were used to implement the methodology.
Experiment Setup	Yes	The dictionaries are trained on KV pairs generated from the Wiki Text-103 dataset using Adam (Kingma & Ba, 2014) with a learning rate of 0.0001 and a cosine decay schedule over 20 epochs. ... For both experiments, Lexico uses a dictionary size of N = 4096, a buffer size of nb = 128, and an approximation window size na = 1, compressing the oldest token with each new token generated. For KIVI-4 and KIVI-2, we use a quantization group size of g = 32 and a buffer size of nb = 128 ... For GSM8K and MMLU-Pro, we test for stronger memory savings, so we use g = 64 and nb = 64 for KIVI.