reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

Authors: Andy Dong, Wei-Ning Chen, Ayfer Ozgur

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that when training a 44 million parameter model with partial model splitting and DP-SGD with (8, 10 5)-DP, accounting for privacy amplification with our theory yields higher validation accuracy than using existing accounting methods. Table 1. Comparison of three domain adaptation methods with a data subsampling rate of 0.1, under centralized setting for samplelevel (8, 10 5)-DP. Noise standard deviation is relative to the clipping norm (and already divided by expected batch size). Validation accuracies are best of 3 random seeds and have standard deviations around 0.7%.
Researcher Affiliation	Collaboration	1Department of Electrical Engineering, Stanford University, California, USA 2Microsoft. Correspondence to: Andy Dong <EMAIL>.
Pseudocode	Yes	Algorithm 1 Differentially Private Model-Parallel Training Algorithm 2 Differentially Private Training with Dropout
Open Source Code	No	No explicit statement about providing source code for the methodology described in this paper is found. The paper mentions using "Google Differential Privacy Team. Google differential privacy library, 2024. URL https://github.com/google/differential-privacy" but this is a third-party library, not their own code.
Open Datasets	Yes	In this section, we train Res Net-101 on CIFAR-10 with model splitting under both centralized setting and federated setting, and analyze their respective privacy guarantees by using the techniques and theoretical results presented in Sections 3.2.1 and 3.2.2. ... For training Wide Res Net-40-4 from scratch on CIFAR-10, (8, 10 5)-DP, 2000 iterations
Dataset Splits	Yes	We split the dataset into two halves, each containing half of the images from each class. We use the first half to pre-train the model without DP, but only with 8 of the 10 classes. ... Then, we use the second half of the dataset for private finetuning.
Hardware Specification	No	Some of the computing for this project was performed on the Sherlock cluster. We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. This mentions a cluster name but lacks specific hardware details (GPU/CPU models, memory, etc.).
Software Dependencies	No	The paper mentions using the "Google Differential Privacy library" but does not provide specific version numbers for this or any other key software components like programming languages or frameworks used for their implementation.
Experiment Setup	Yes	Under centralized training, we train for 1000 iterations using (8, 10 5)-DP. Under federated training, each user holds 2 samples and we train for 250 sessions. In each training session, each user trains their received model locally for 3 iterations, sends the model update to the server, and the server aggregates the model updates and adds noise to ensure a user-level (8, 10 5)-DP. To make the training compatible with DP, we replace the batch normalization layers in the model with group normalization layers. ... For training Wide Res Net-40-4 from scratch on CIFAR-10, (8, 10 5)-DP, 2000 iterations, using each sample 655 times for Balanced Iteration Subsampling and with probability 655 2000 in each iteration for Poisson Subsampling.