reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ReAttention: Training-Free Infinite Context with Finite Attention Scope

Authors: Xiaoran Liu, Ruixiao Li, Zhigeng Liu, Qipeng Guo, Yuerong Song, Kai Lv, Hang Yan, Linlin Li, Qun Liu, Xipeng Qiu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the performance of Re Attention on the Long Bench, L-Eval, and Infinite Bench and demonstrate that it is on par with traditional methods. Furthermore, we also apply Re Attention on mainstream LLMs, including LLa MA3.1-8B and Mistral-v0.3-7B, enabling them to support context lengths of at least 1M and even expanding the context length of LLa MA3.2-3B-chat by 128 to 4M without any further training in Needle-In-A-Haystack tests. We conduct experiments on LLa MA3-8B-8K (Meta, 2024a), LLa MA3.1-8B-128K (Dubey et al., 2024), LLa MA3.1-70B-128K (Dubey et al., 2024), LLa MA3.2-3B-128K (Dubey et al., 2024), Mistralv0.3-7B-32K (mistralai, 2024), Intern LM2.5-7B-1M (Intern LM, 2024), Qwen2-7B-128K (Yang et al., 2024a), Qwen2-72B-128K (Yang et al., 2024a), Qwen2-1B-32K (Yang et al., 2024a).
Researcher Affiliation	Collaboration	1School of Computer Science, Fudan University, 2Huawei Noah s Ark Lab, 3Shanghai AI Lab, 4Shanghai Innovation Institute
Pseudocode	Yes	The pseudocode of the whole process is detailed in Appendix A. (See Appendix A for Algorithm 1: Prefilling Phase and Algorithm 2: Decoding Phase).
Open Source Code	Yes	The code is available at https://github.com/Open MOSS/Re Attention.
Open Datasets	Yes	We validate the performance of Re Attention on the Long Bench, L-Eval, and Infinite Bench and demonstrate that it is on par with traditional methods. Furthermore, we also apply Re Attention on mainstream LLMs, including LLa MA3.1-8B and Mistral-v0.3-7B, enabling them to support context lengths of at least 1M and even expanding the context length of LLa MA3.2-3B-chat by 128 to 4M without any further training in Needle-In-A-Haystack tests.
Dataset Splits	Yes	We first evaluate all 9 LLMs on the commonly used long-context benchmark Long Bench (Bai et al., 2023) and L-Eval (An et al., 2023), with a default context length of 32K and a middle truncation. We validate our method on Infinite Bench (Zhang et al., 2024c), a more challenging benchmark with a longer context length. We choose 3 commonly tested subtasks, En.MC, En.QA and En.Sum, evaluate models with varying context lengths.
Hardware Specification	Yes	We perform experiments on 8 A100 GPUs and extend the context lengths of LLMs with Re Attention to at least 1M tokens. All experiments were conducted on a system with a 48-core CPU, 256GB RAM, and an A800-80GB GPU.
Software Dependencies	Yes	All experiments are performed with FP16 precision and accelerated with Flash Attention2 (Dao, 2023). we use Triton (Tillet et al., 2019), a GPU programming language, to minimize read and write overheads in top-k attention.
Experiment Setup	Yes	For all models, we set the length of Kglobal to 32, the length of Klocal to 4096, and selected span size to 32. Moreover, we set k = 4, k = 127 in top-k attention. Importantly, the attention scope in each step remains within the maximum attention window. For example, for the LLa MA3-8B-8K with Re Attention, the maximum attention scope size is 32 + 4096 + 127 × 32, which exactly matches the maximum supported attention window of 8192. We use Open Compass (Contributors, 2023b) for validation. All experiments are performed with FP16 precision and accelerated with Flash Attention2 (Dao, 2023).