ParallelComp: Parallel Long-Context Compressor for Length Extrapolation
Authors: Jing Xiong, Jianghan Shen, Chuanyang Zheng, Zhongwei Wan, Chenyang Zhao, Chiwun Yang, Fanghua Ye, Hongxia Yang, Lingpeng Kong, Ngai Wong
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiment Experimental results show that PARALLELCOMP enables an 8B model (trained on 8K context) to achieve 91.17% of GPT-4 s performance under ultra-long contexts, outperforming closed-source models such as Claude-2 and Kimi-Chat. |
| Researcher Affiliation | Collaboration | 1The University of Hong Kong, 2Nanjing University, 3The Chinese University of Hong Kong, 4The Ohio State University, 5The University of California, Los Angeles, 6Sun Yat-Sen University, 7Tencent, 8Hong Kong Polytechnic University. |
| Pseudocode | No | The paper describes methods in text and uses diagrams (e.g., Figure 2) to illustrate processes, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release the code at https://github.com/ menik1126/Parallel Comp. |
| Open Datasets | Yes | We compare our method with existing length extrapolation approaches... on Long Bench (Bai et al., 2023) and Infinite Bench (Zhang et al., 2024)... We present the results of perplexity (PPL) calculations on the Narrative QA (Koˇcisk y et al., 2018) test set. |
| Dataset Splits | Yes | We present the results of perplexity (PPL) calculations on the Narrative QA (Koˇcisk y et al., 2018) test set. |
| Hardware Specification | Yes | enabling 8B-parameter LLMs to extrapolate from 8K to 128K tokens on a single A100 80GB GPU |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For the hyperparameter τ, on Longbench, we retain 3 chunks from the priority queue except for PRe, in which dataset we retain only 1 chunk. On Infinite Bench, we retain 1 chunk for retrieval tasks and 3 chunks for other tasks from the priority queue. In all datasets, the context length of each chunk, including the query, is the maximum pre-training length of the model. Rs is obtained from the first 100 tokens of the chunk, Rr is obtained from the last 100 tokens of the chunk, and the remaining part of the chunk obtains Rm. |