Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

Authors: Haoqi Wu, Wei Dai, Wang Li, Qiang Yan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple datasets, along with ablation studies, demonstrate that Cape achieves a better privacy-utility tradeoff compared to prior state-of-the-art works.
Researcher Affiliation Industry Haoqi Wu 1 Wei Dai 1 Li Wang 1 Qiang Yan 1 1Tik Tok. Correspondence to: Haoqi Wu <EMAIL>.
Pseudocode Yes Algorithm 1 Equal width Bucketing: Bucket Input: Utility score vector u, the number of buckets Nb Output: A set of buckets B with different tokens. ... Algorithm 2 Cape Mechanism: R Input: Prompt x = {t1, t2, ..., tn}, device model M, embedding table e, importance factors λL, λD, bucket number Nb, and budget ϵ > 0 Output: Perturbed prompt ˆx R(x)
Open Source Code No The paper does not provide a direct link to a code repository or an explicit statement of code release for the methodology described in this paper. It mentions
Open Datasets Yes For the former, we follow prior works (Yue et al., 2021; Chen et al., 2022) to use two GLUE datasets with privacy implications. 1) SST-2: This contains sentiment annotations for movie reviews...; 2) QNLI: This is a dataset containing sentence pairs for binary classification... For the latter, we follow (Tong et al., 2023) to use Wikitext-103-v1, a large-scale dataset derived from Wikipedia articles for language modeling tasks.
Dataset Splits Yes For the former, we follow prior works (Yue et al., 2021; Chen et al., 2022) to use two GLUE datasets with privacy implications. 1) SST-2: This contains sentiment annotations for movie reviews, which is used to perform sentiment classification (positive or negative); 2) QNLI: This is a dataset containing sentence pairs for binary classification (entailment/not entailment). We use accuracy as the metric. ... For the latter, we follow (Tong et al., 2023) to use Wikitext-103-v1, a large-scale dataset derived from Wikipedia articles for language modeling tasks. ... on the validation set of SST-2 and QNLI datasets.
Hardware Specification Yes All the experiments are carried out on one Debian 11 machine equipped with one Intel Xeon Platinum 8260 CPU (6 cores and 2.40GHz), 16GB of RAM and 4 Nvidia Tesla-V100-SXM2-32GB GPUs.
Software Dependencies No The paper mentions 'one Debian 11 machine' as the operating system but does not provide specific version numbers for other software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup Yes By default, we set λL = 0.5, λD = 1.0 and Nb = 50. We run inference on the original data as non-private baseline. ... (default temperature of 0.5).