Leveraging Randomness in Model and Data Partitioning for Privacy Amplification
Authors: Andy Dong, Wei-Ning Chen, Ayfer Ozgur
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that when training a 44 million parameter model with partial model splitting and DP-SGD with (8, 10 5)-DP, accounting for privacy amplification with our theory yields higher validation accuracy than using existing accounting methods. Table 1. Comparison of three domain adaptation methods with a data subsampling rate of 0.1, under centralized setting for samplelevel (8, 10 5)-DP. Noise standard deviation is relative to the clipping norm (and already divided by expected batch size). Validation accuracies are best of 3 random seeds and have standard deviations around 0.7%. |
| Researcher Affiliation | Collaboration | 1Department of Electrical Engineering, Stanford University, California, USA 2Microsoft. Correspondence to: Andy Dong <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Differentially Private Model-Parallel Training Algorithm 2 Differentially Private Training with Dropout |
| Open Source Code | No | No explicit statement about providing source code for the methodology described in this paper is found. The paper mentions using "Google Differential Privacy Team. Google differential privacy library, 2024. URL https://github.com/google/differential-privacy" but this is a third-party library, not their own code. |
| Open Datasets | Yes | In this section, we train Res Net-101 on CIFAR-10 with model splitting under both centralized setting and federated setting, and analyze their respective privacy guarantees by using the techniques and theoretical results presented in Sections 3.2.1 and 3.2.2. ... For training Wide Res Net-40-4 from scratch on CIFAR-10, (8, 10 5)-DP, 2000 iterations |
| Dataset Splits | Yes | We split the dataset into two halves, each containing half of the images from each class. We use the first half to pre-train the model without DP, but only with 8 of the 10 classes. ... Then, we use the second half of the dataset for private finetuning. |
| Hardware Specification | No | Some of the computing for this project was performed on the Sherlock cluster. We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. This mentions a cluster name but lacks specific hardware details (GPU/CPU models, memory, etc.). |
| Software Dependencies | No | The paper mentions using the "Google Differential Privacy library" but does not provide specific version numbers for this or any other key software components like programming languages or frameworks used for their implementation. |
| Experiment Setup | Yes | Under centralized training, we train for 1000 iterations using (8, 10 5)-DP. Under federated training, each user holds 2 samples and we train for 250 sessions. In each training session, each user trains their received model locally for 3 iterations, sends the model update to the server, and the server aggregates the model updates and adds noise to ensure a user-level (8, 10 5)-DP. To make the training compatible with DP, we replace the batch normalization layers in the model with group normalization layers. ... For training Wide Res Net-40-4 from scratch on CIFAR-10, (8, 10 5)-DP, 2000 iterations, using each sample 655 times for Balanced Iteration Subsampling and with probability 655 2000 in each iteration for Poisson Subsampling. |