CipherPrune: Efficient and Scalable Private Transformer Inference
Authors: Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei Jiang, Qian Lou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that Cipher Prune reduces the execution overhead of private Transformer inference by approximately 6.1 for 128-token inputs and 10.6 for 512-token inputs, compared to previous methods, with only a marginal drop in accuracy. The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Yancheng Zhang1, Jiaqi Xue1, Mengxin Zheng1 Mimi Xie2, Mingzhe Zhang3, Lei Jiang4, Qian Lou1* 1University of Central Florida 2University of Texas at San Antonio 3Ant Research 4Indiana University Bloomington |
| Pseudocode | Yes | Algorithm 1 Crypto-aware Thresholds Learning |
| Open Source Code | Yes | The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference. |
| Open Datasets | Yes | Similar to prior work (Pang et al., 2024), we fine-tune the BERT models on four downstream NLP tasks in GLUE benchmarks (Wang et al., 2018): the Multi-Genre Natural Language Inference Corpus (MNLI), the Stanford Question Answering Dataset (QNLI), the Stanford Sentiment Treebank (SST-2), and the Microsoft Research Paraphrase Corpus (MRPC). |
| Dataset Splits | Yes | Similar to prior work (Pang et al., 2024), we fine-tune the BERT models on four downstream NLP tasks in GLUE benchmarks (Wang et al., 2018): the Multi-Genre Natural Language Inference Corpus (MNLI), the Stanford Question Answering Dataset (QNLI), the Stanford Sentiment Treebank (SST-2), and the Microsoft Research Paraphrase Corpus (MRPC). |
| Hardware Specification | Yes | All experiments are conducted on an AMD Ryzen Threadripper PRO 3955WX (2.2GHz, 125GB RAM) and fine-tuning of the BERT model with threshold learning is done on NVIDIA Ge Force RTX 3090 GPUs with CUDA 11.0.3. |
| Software Dependencies | Yes | Cipher Prune uses the Ez PC (Ez P, 2023) framework and the SEAL (SEA, 2023) library. Ez PC compiles Tensor Flow-based deep neural networks into secure computation protocols running on cryptographic backends... fine-tuning of the BERT model with threshold learning is done on NVIDIA Ge Force RTX 3090 GPUs with CUDA 11.0.3. |
| Experiment Setup | Yes | Algorithm 1 Crypto-aware Thresholds Learning Input: pre-trained Transformer M, training data D, initial thresholds θ, β... The hyperparameters λ and α dictate the extent of pruning and approximation, with higher values leading to increased pruning or approximation. In Figure 12, we show the accuracy-latency trade-off for the BERT-Base model under different parameter settings. Larger λ and α result in more tokens being pruned or reduced. With λ less than 0.05, an appropriate ratio of tokens is pruned, maintaining a stable accuracy of around 90%. Smaller α leads to more tokens being computed with high-degree polynomials, which increases accuracy but also latency. |