Improving the Variance of Differentially Private Randomized Experiments through Clustering
Authors: Adel Javanmard, Vahab Mirrokni, Jean Pouget-Abadie
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate the theoretical and empirical performance of our CLUSTER-DP algorithm on both real and simulated data, comparing it to common baselines, including two special cases of our algorithm: its unclustered version and a uniformprior version. |
| Researcher Affiliation | Collaboration | 1Marshall School of Business, University of Southern California, Los Angeles, USA 2Google Research, New York, USA. Correspondence to: Adel Javanmard <EMAIL>, Jean Pouget-Abadie <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 UNIFORM-PRIOR-DP mechanism Algorithm 2 Our CLUSTER-DP mechanism |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The You Tube social network dataset (Leskovec & Krevl, 2014) contains the friendship links of a set of users on You Tube, and the ground-truth clusters correspond to groups created by users. |
| Dataset Splits | No | The paper describes how data is generated and sampled for experiments (e.g., "super-population of three clusters of sizes 2.5e3, 5e3, and 10e4 units, and repeatedly draw uniformly at random sub-populations of three clusters from these original clusters"). For the You Tube dataset, it mentions "considering only the 50 largest communities". However, it does not provide specific training/test/validation dataset splits with percentages or counts, which are typically used for reproducing machine learning experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiments. |
| Experiment Setup | Yes | Unless otherwise specified, and with no particular reason to fix parameters one way or another, we take K = 5, v = 5, and β = 4.5. We consider C = 3 clusters of sizes 500, 103, 2 103 with an equal number of controlled and treated units in each cluster. ... for CLUSTERDP mechanism, we set the truncation parameter γ = 0.02, the Laplace noise σ = 10, and the resampling probability λ = 0.8. ... For the CLUSTERDP and CLUSTER-FREE-DP, we set the Laplace parameter to σ = 10, and vary the truncation parameter γ [0.1/K, 1/K]. ... In the CLUSTER-DP mechanism, we set the truncation threshold to γ = 0.1/K and the Laplace noise level to σ = 5. |