reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CipherPrune: Efficient and Scalable Private Transformer Inference

Authors: Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei Jiang, Qian Lou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that Cipher Prune reduces the execution overhead of private Transformer inference by approximately 6.1 for 128-token inputs and 10.6 for 512-token inputs, compared to previous methods, with only a marginal drop in accuracy. The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	Yancheng Zhang1, Jiaqi Xue1, Mengxin Zheng1 Mimi Xie2, Mingzhe Zhang3, Lei Jiang4, Qian Lou1* 1University of Central Florida 2University of Texas at San Antonio 3Ant Research 4Indiana University Bloomington
Pseudocode	Yes	Algorithm 1 Crypto-aware Thresholds Learning
Open Source Code	Yes	The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference.
Open Datasets	Yes	Similar to prior work (Pang et al., 2024), we fine-tune the BERT models on four downstream NLP tasks in GLUE benchmarks (Wang et al., 2018): the Multi-Genre Natural Language Inference Corpus (MNLI), the Stanford Question Answering Dataset (QNLI), the Stanford Sentiment Treebank (SST-2), and the Microsoft Research Paraphrase Corpus (MRPC).
Dataset Splits	Yes	Similar to prior work (Pang et al., 2024), we fine-tune the BERT models on four downstream NLP tasks in GLUE benchmarks (Wang et al., 2018): the Multi-Genre Natural Language Inference Corpus (MNLI), the Stanford Question Answering Dataset (QNLI), the Stanford Sentiment Treebank (SST-2), and the Microsoft Research Paraphrase Corpus (MRPC).
Hardware Specification	Yes	All experiments are conducted on an AMD Ryzen Threadripper PRO 3955WX (2.2GHz, 125GB RAM) and fine-tuning of the BERT model with threshold learning is done on NVIDIA Ge Force RTX 3090 GPUs with CUDA 11.0.3.
Software Dependencies	Yes	Cipher Prune uses the Ez PC (Ez P, 2023) framework and the SEAL (SEA, 2023) library. Ez PC compiles Tensor Flow-based deep neural networks into secure computation protocols running on cryptographic backends... fine-tuning of the BERT model with threshold learning is done on NVIDIA Ge Force RTX 3090 GPUs with CUDA 11.0.3.
Experiment Setup	Yes	Algorithm 1 Crypto-aware Thresholds Learning Input: pre-trained Transformer M, training data D, initial thresholds θ, β... The hyperparameters λ and α dictate the extent of pruning and approximation, with higher values leading to increased pruning or approximation. In Figure 12, we show the accuracy-latency trade-off for the BERT-Base model under different parameter settings. Larger λ and α result in more tokens being pruned or reduced. With λ less than 0.05, an appropriate ratio of tokens is pruned, maintaining a stable accuracy of around 90%. Smaller α leads to more tokens being computed with high-degree polynomials, which increases accuracy but also latency.