An Efficient Private GPT Never Autoregressively Decodes

Authors: Zhengyi Li, Yue Guan, Kang Yang, Yu Feng, Ning Liu, Yu Yu, Jingwen Leng, Minyi Guo

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate a 2.1 6.0 speedup compared to standard decoding across three pairs of public-private models and different network conditions.
Researcher Affiliation Academia 1Shanghai Jiao Tong University 2Shanghai Qizhi Institute 3State Key Laboratory of Cryptology. Correspondence to: Jingwen Leng <EMAIL>, Kang Yang <EMAIL>, Yu Yu <EMAIL>.
Pseudocode Yes Algorithm 1 Privately Reject Draft Tokens
Open Source Code No The paper does not provide an explicit statement about releasing their own implementation code or a direct link to a code repository for the POST approach. It mentions using existing frameworks like Secret Flow-SPU and protocols like Bumble Bee and Nimbus, but these are third-party tools or prior works.
Open Datasets Yes We evaluate performance across four diverse tasks: Textto-SQL (Spider) (Yu et al., 2018), graduate school math (Gsm8k) (Cobbe et al., 2021), Python code generation (Code-search-Python) (Husain et al., 2019), financial question answering (Alpaca-finance) (Gaurang Bharti, 2024).
Dataset Splits No The paper discusses different tasks and models but does not explicitly state the training, validation, or test dataset splits (e.g., percentages or specific counts) for these datasets or for the knowledge distillation process.
Hardware Specification No Performance evaluations are conducted on two nodes with 64 v CPUs and 128 GB memory.
Software Dependencies No The paper mentions using 'Bumble Bee (Lu et al., 2025) and Nimbus (Li et al., 2024b)' and 'Secret Flow-SPU (Ma et al., 2023)' as frameworks and protocols, but does not provide specific version numbers for these or other key software components (e.g., Python, PyTorch, CUDA versions).
Experiment Setup No The paper mentions simulating network conditions ('(1 Gbps, 10 ms) for LAN and (400 Mbps, 40 ms) for WAN') and using cross-entropy for model alignment, but it does not specify concrete hyperparameters like learning rates, batch sizes, optimizers, or number of epochs for model training or fine-tuning, which are crucial for reproducing the experimental setup.