Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs
Authors: Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Christopher A. Choquette-Choo, Adam Dziedzic
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our thorough experimental evaluation on both masked language models and auto-regressive language models demonstrates that our method can efficiently, effectively, and privately transfer soft prompts with high utility. Our code is available at https://github.com/sprintml/POST. 1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany 2Google Deep Mind. Correspondence to: Franziska Boenisch <EMAIL>, Adam Dziedzic <EMAIL>. Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). 1. Introduction... 5. Empirical Evaluation |
| Researcher Affiliation | Collaboration | 1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany 2Google Deep Mind. Correspondence to: Franziska Boenisch <EMAIL>, Adam Dziedzic <EMAIL>. |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/sprintml/POST. |
| Open Datasets | Yes | Following prior work (Hong et al., 2023; Wu et al., 2023), we evaluate the performance of our proposed method on five classification-task datasets: sst2 from the GLUE benchmark (Wang et al., 2019), imdb (Maas et al., 2011), tweet (Rosenthal et al., 2017), arisetv (Okite, 2022) and mpqa (Wiebe et al., 2005). ... As public data, we also include agnews (Zhang et al., 2015) and boolq (Clark et al., 2019) for the classification task, while for the generation task, we use AIE (Kudari, 2022). We also include disaster (Crowd Flower, 2019) and trec (Li & Roth, 2002) for baseline comparison and ablation on the choice of public data. |
| Dataset Splits | Yes | To evaluate the success of our method, we report the accuracy on the test data split of our private datasets for the teacher LLM with the transferred prompt (Private Transfer). ... Datasets. Following prior work (Hong et al., 2023; Wu et al., 2023), we evaluate the performance of our proposed method on five classification-task datasets: sst2 from the GLUE benchmark (Wang et al., 2019), imdb (Maas et al., 2011), tweet (Rosenthal et al., 2017), arisetv (Okite, 2022) and mpqa (Wiebe et al., 2005). |
| Hardware Specification | Yes | All experiments are executed on a single A100 GPU. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers, such as 'Python 3.x' or 'PyTorch X.Y'. |
| Experiment Setup | Yes | We experiment with various degrees of compression in the KD (see Appendix D.4). For the results presented in the main body of the paper, we compress the 12-layer Roberta-base and the 32-layer Llama2-7b to 2 layers, and the 48-layer GPT2-XL to 4 layers. We use the Bookcorpus (Zhu et al., 2015) dataset for the KD. ... Following Su et al. (2022), we initialize our soft prompts with 100 tokens. The hyperparameters for prompt tuning per dataset, including the δ for the DP setup, are presented in Table 10. ... By default, we use 5000 steps for Roberta-base, 8000 steps for GPT2-XL, and 6000 steps for Llama2-7b. ... Table 8: Hyperparameters in Knowledge Distillation. ... Table 11: Hyperparameters used during Prompt Transfer. ... Table 12: Setting of α for Different Datasets and Models during Prompt Trasnfer. |