An Efficient Private GPT Never Autoregressively Decodes
Authors: Zhengyi Li, Yue Guan, Kang Yang, Yu Feng, Ning Liu, Yu Yu, Jingwen Leng, Minyi Guo
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate a 2.1 6.0 speedup compared to standard decoding across three pairs of public-private models and different network conditions. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University 2Shanghai Qizhi Institute 3State Key Laboratory of Cryptology. Correspondence to: Jingwen Leng <EMAIL>, Kang Yang <EMAIL>, Yu Yu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Privately Reject Draft Tokens |
| Open Source Code | No | The paper does not provide an explicit statement about releasing their own implementation code or a direct link to a code repository for the POST approach. It mentions using existing frameworks like Secret Flow-SPU and protocols like Bumble Bee and Nimbus, but these are third-party tools or prior works. |
| Open Datasets | Yes | We evaluate performance across four diverse tasks: Textto-SQL (Spider) (Yu et al., 2018), graduate school math (Gsm8k) (Cobbe et al., 2021), Python code generation (Code-search-Python) (Husain et al., 2019), financial question answering (Alpaca-finance) (Gaurang Bharti, 2024). |
| Dataset Splits | No | The paper discusses different tasks and models but does not explicitly state the training, validation, or test dataset splits (e.g., percentages or specific counts) for these datasets or for the knowledge distillation process. |
| Hardware Specification | No | Performance evaluations are conducted on two nodes with 64 v CPUs and 128 GB memory. |
| Software Dependencies | No | The paper mentions using 'Bumble Bee (Lu et al., 2025) and Nimbus (Li et al., 2024b)' and 'Secret Flow-SPU (Ma et al., 2023)' as frameworks and protocols, but does not provide specific version numbers for these or other key software components (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | No | The paper mentions simulating network conditions ('(1 Gbps, 10 ms) for LAN and (400 Mbps, 40 ms) for WAN') and using cross-entropy for model alignment, but it does not specify concrete hyperparameters like learning rates, batch sizes, optimizers, or number of epochs for model training or fine-tuning, which are crucial for reproducing the experimental setup. |