Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion
Authors: Tianyuan Zou, Yang Liu, Peng Li, Yufei Xiong, Jianqing Zhang, Jingjing Liu, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 6 well-developed datasets with 6 open-source and 3 closed-source PLMs demonstrate the superiority of WASP in improving model performance over diverse downstream tasks. Code is available at https://github.com/LindaLydia/WASP. |
| Researcher Affiliation | Collaboration | 1Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 2the Hong Kong Polytechnic University, Hong Kong, China 3Shanghai Artificial Intelligence Laboratory, Shanghai, China 4the Department of Mathematics, Harbin Institute of Technology, Weihai, Shandong, China 5Shanghai Jiao Tong University, Shanghai, China 6Asia Info Technologies, Shanghai, China. Correspondence to: Yang Liu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 WASP Input: K PLMs {Pk}K k=1 with empty synthetic dataset {Dk }K k=1; 1 data party with private dataset B of size M belonging to C categories; number of in-context samples S; number of iterations T taken to obtain in total N synthetic samples; initialized PLM weights {wk = 1/K}K k=1; learning rate η; DP privacy parameters ϵ, δ, δiter; test dataset A; random initialized STM m(0). Output: STM m. ... Algorithm 2 WASP for Distributed Federated Data (L > 1) ... Algorithm 3 Functions used in Algorithms 1 and 2 for WASP |
| Open Source Code | Yes | Code is available at https://github.com/LindaLydia/WASP. |
| Open Datasets | Yes | We evaluate on 6 widely used tasks: 1) IMDb (Maas et al., 2011) (2 categories) for movie-review semantic analysis task; 2) Yelp-Category (Inc. Yelp, 2015) (10 categories) for business-review item field classification task; 3) Yelp-Rating (Inc. Yelp, 2015) (5 categories) for business-review rating classification task; 4) Openreview Category (Xie et al., 2024) (12 categories) for paper-review classification by research area task; 5) Openreview Rating (Xie et al., 2024) (5 categories) for paper-review classification by review rating task; and 6) Banking (10 categories selected from Banking77 (Casanueva et al., 2020)) for online-banking queries field classification task. |
| Dataset Splits | Yes | By default, we use 100 private samples (M = 100) for main experiments. For federated data (L > 1) scenario, we use L = 10 private data parties which control 300 private samples (M = P10 l=1 |Bl| = 300) altogether. To better align with real-world scenarios, each participating data-party controls private datasets that are non-i.i.d. to each other, and aggregate to an unbalanced dataset. We follow Dirichlet Partition (Yurochkin et al., 2019; Hsu et al., 2019; Zhang et al., 2023) to distribute private samples to each party with parameter α = 1.0. For the DP synthetic dataset, we generate a total of 6,000 samples from all participating PLMs within 5 iteration. ... B is randomly drawn from the training sets of these datasets with their test sets used to evaluate trained STM. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions specific pre-trained models like GPT-2, Llama-2, Vicuna, OPT, Chat GLM3, Flan-T5, GPT-3.5, GPT-4, GPT-4o, BERT, and sentencet5-base, but does not provide specific version numbers for software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | By default, we use 100 private samples (M = 100) for main experiments. For federated data (L > 1) scenario, we use L = 10 private data parties which control 300 private samples (M = P10 l=1 |Bl| = 300) altogether. ... For the DP synthetic dataset, we generate a total of 6,000 samples from all participating PLMs within 5 iteration. Since the first iteration does not use private sample information for feedback, only the last 4 iterations are sensitive to privacy. By default, we use δiter = 1 10 5 in our experiments and list only ϵ alongside the results. The notion of DP is sample-level DP unless otherwise stated. |