Differentially private optimization for non-decomposable objective functions
Authors: Weiwei Kong, Andres Munoz medina, Mónica Ribero
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method s performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss. |
| Researcher Affiliation | Industry | Weiwei Kong, Andr es Mu noz Medina & M onica Ribero Google Research New York, NY, USA EMAIL |
| Pseudocode | Yes | Algorithm 1: Logit-DP Input: Sensitivity bound B > 0, sensitivity constants G1, G2, L > 0, dataset D = {(xi, x i)}N i=1, batch size n, iteration limit T 1, stepsize η > 0, noise multiplier σ > 0, model Φ Output: Embedding model Φw T 1 Initialize weights w0 in Φ; 2 Compute gradient sensitivity C = (G1 + G2 + n L)B; 3 for t = 1, 2, ..., T 1 do 4 Sample batch X = {(x1, x 1), ..., (xn, x n)}; 5 for i, j = 1, ..., n do 6 Compute similarity gradients Zij X(wt) = wt S(Φwt(xi), Φwt(x j)); 7 Clip gradients to obtain Clip B( Zij X(wt)) = min n B Zij X (wt) , 1 o Zij X(wt); 9 Compute g using (5) Compute noisy gradient g = g + Y with Y N(0, σCIp); 10 Update the model wt+1 = wt η g; |
| Open Source Code | Yes | Our code is publicly available at https://github.com/google-research/google-research/ tree/master/logit_dp |
| Open Datasets | Yes | We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 finetuning tasks |
| Dataset Splits | Yes | All variants used the standard Adam optimizer for training and used the canonical 80-20 train-test split of the CIFAR10 dataset. |
| Hardware Specification | Yes | All models were trained on a single NVidia V100 GPU using a cloud computing platform with 512 GB of RAM. |
| Software Dependencies | No | The paper mentions "All variants used the standard Adam optimizer for training" but does not specify a version number for Adam or any other software dependencies. |
| Experiment Setup | Yes | The learning rates for Logit-DP, Naive-DP, and Non-Private were 10 2, 10 2, and 10 3, respectively, for the generic embedding net experiments and 10 4, 10 3, and 10 2, respectively, for the Res Net18 experiments. All variants used the standard Adam optimizer for training and used the canonical 80-20 train-test split of the CIFAR10 dataset. However, Logit-DP used 25 and 100 gradient accumulation steps for the generic embedding net and Res Net18 experiments, respectively. The batch size during training was 10, 000 and 1, 000 for the generic embedding net and Res Net18 experiments, respectively, and the entire testing dataset was used for evaluating test metrics. Moreover, each variant was run for 20 and 2 epochs over the entire training dataset for the generic embedding net and Res Net18 experiments in Table 1, respectively. For the DP variants, we fixed the desired ℓ2 sensitivity to be 10 4 and 10 5 for Naive-DP and Logit-DP, respectively, in the generic embedding net experiments and 10 3 and 10 5, respectively, in the Res Net18 experiments. All DP methods chose a noise multiplier so that ε-DP was achieved for ϵ = 5.0. Finally, all hyperparameter tuning was done through a grid search of various learning rates (10 5, 10 4, ..., 10 2) and ℓ2 sensitivities (10 6, 10 5, ..., 10 0). |