Differentially private optimization for non-decomposable objective functions

Authors: Weiwei Kong, Andres Munoz medina, Mónica Ribero

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method s performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
Researcher Affiliation Industry Weiwei Kong, Andr es Mu noz Medina & M onica Ribero Google Research New York, NY, USA EMAIL
Pseudocode Yes Algorithm 1: Logit-DP Input: Sensitivity bound B > 0, sensitivity constants G1, G2, L > 0, dataset D = {(xi, x i)}N i=1, batch size n, iteration limit T 1, stepsize η > 0, noise multiplier σ > 0, model Φ Output: Embedding model Φw T 1 Initialize weights w0 in Φ; 2 Compute gradient sensitivity C = (G1 + G2 + n L)B; 3 for t = 1, 2, ..., T 1 do 4 Sample batch X = {(x1, x 1), ..., (xn, x n)}; 5 for i, j = 1, ..., n do 6 Compute similarity gradients Zij X(wt) = wt S(Φwt(xi), Φwt(x j)); 7 Clip gradients to obtain Clip B( Zij X(wt)) = min n B Zij X (wt) , 1 o Zij X(wt); 9 Compute g using (5) Compute noisy gradient g = g + Y with Y N(0, σCIp); 10 Update the model wt+1 = wt η g;
Open Source Code Yes Our code is publicly available at https://github.com/google-research/google-research/ tree/master/logit_dp
Open Datasets Yes We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 finetuning tasks
Dataset Splits Yes All variants used the standard Adam optimizer for training and used the canonical 80-20 train-test split of the CIFAR10 dataset.
Hardware Specification Yes All models were trained on a single NVidia V100 GPU using a cloud computing platform with 512 GB of RAM.
Software Dependencies No The paper mentions "All variants used the standard Adam optimizer for training" but does not specify a version number for Adam or any other software dependencies.
Experiment Setup Yes The learning rates for Logit-DP, Naive-DP, and Non-Private were 10 2, 10 2, and 10 3, respectively, for the generic embedding net experiments and 10 4, 10 3, and 10 2, respectively, for the Res Net18 experiments. All variants used the standard Adam optimizer for training and used the canonical 80-20 train-test split of the CIFAR10 dataset. However, Logit-DP used 25 and 100 gradient accumulation steps for the generic embedding net and Res Net18 experiments, respectively. The batch size during training was 10, 000 and 1, 000 for the generic embedding net and Res Net18 experiments, respectively, and the entire testing dataset was used for evaluating test metrics. Moreover, each variant was run for 20 and 2 epochs over the entire training dataset for the generic embedding net and Res Net18 experiments in Table 1, respectively. For the DP variants, we fixed the desired ℓ2 sensitivity to be 10 4 and 10 5 for Naive-DP and Logit-DP, respectively, in the generic embedding net experiments and 10 3 and 10 5, respectively, in the Res Net18 experiments. All DP methods chose a noise multiplier so that ε-DP was achieved for ϵ = 5.0. Finally, all hyperparameter tuning was done through a grid search of various learning rates (10 5, 10 4, ..., 10 2) and ℓ2 sensitivities (10 6, 10 5, ..., 10 0).