reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

Authors: Fengyu Gao, Ruida Zhou, Tianhao Wang, Cong Shen, Jing Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on standard benchmarks and compare Ada DPSyn with DP few-shot generation algorithm (Tang et al., 2023). The experiments demonstrate that Ada DPSyn not only outperforms DP few-shot generation, but also maintains high accuracy levels close to those of non-private baselines, providing an effective solution for ICL with privacy protection.
Researcher Affiliation	Academia	Fengyu Gao University of Virginia EMAIL Ruida Zhou University of California Los Angeles EMAIL Tianhao Wang University of Virginia EMAIL Cong Shen University of Virginia EMAIL Jing Yang University of Virginia EMAIL
Pseudocode	Yes	Algorithm 1 Ada DPSyn, Algorithm 2 DP few-shot generation (Tang et al., 2023), Algorithm 3 Next Token Generation (Tang et al., 2023), Algorithm 4 Good Radius (Nissim et al., 2016), Algorithm 5 DP Binary Search
Open Source Code	No	The text discusses the use of a third-party tool, vLLM platform (https://github.com/vllm-project/vllm), but does not provide any statement or link for the open-sourcing of the authors' own implementation code for Ada DPSyn.
Open Datasets	Yes	We study text classification on three datasets: 4-way news classification AGNews (Zhang et al., 2015), 6-way question classification TREC (Voorhees and Tice, 2000), and 14-way topic classification DBPedia (Zhang et al., 2015). For information extraction, we study the MIT Movies trivia10k13 slot-filling dataset (Liu et al., 2012), which includes movie genre (MIT-G) and director name (MIT-D) as slots.
Dataset Splits	Yes	AGNews: It includes 30,000 training samples and 1,900 test samples per class. For our experiments, we randomly select 1,000 samples from the test set. DBPedia: This dataset includes 40,000 training samples and 5,000 test samples for each class. For our experiments, we randomly select 49,999 samples from the training set and 1,000 samples from the test set. TREC: The dataset includes 5,452 training samples and 500 test samples, distributed non-uniformly across the categories. MIT Movies: For training and testing, MIT-G includes 2,953 and 780 samples respectively, while MIT-D contains 1,561 training samples and 415 test samples. We fix nshots = 4 by default, generating 4-shot demonstrations (randomly without replacement from the label set) for ICL.
Hardware Specification	Yes	The experiments use 4 NVIDIA RTX A5000 GPUs, each equipped with 24,564 Mi B of memory. Approximately 24 hours are required to reproduce the main results in Section 5.1.
Software Dependencies	No	The paper mentions using specific LLM models (Llama-2-7b-hf, GPT-4o mini) and platforms (Open AI's API, vLLM platform), but does not provide specific version numbers for software libraries or dependencies used for the implementation of Ada DPSyn.
Experiment Setup	Yes	Hypermarameters in DP few-shot generation are detailed in Table 12, including the number of subsets M, the number of data samples per subset N, the number of tokens Tmax, and the reduced vocabulary size K. To ensure a fair comparison, we select these parameters for DP few-shot generation based on the guidance in Tang et al. (2023), as detailed in Appendix C.5. We then use the same values of M, N, Tmax and K for our Ada DPSyn without any additional hyperparameter tuning. We conduct a hyperparameter search for Ada DPSyn on λ and ˆT in Appendix C.4. Hyperparameters and privacy parameters for the results presented in Table 2 are provided in Table 13-Table 17.