reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially private optimization for non-decomposable objective functions

Authors: Weiwei Kong, Andres Munoz medina, Mónica Ribero

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 ﬁnetuning tasks and show that, in both tasks, our method s performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
Researcher Affiliation	Industry	Weiwei Kong, Andr es Mu noz Medina & M onica Ribero Google Research New York, NY, USA EMAIL
Pseudocode	Yes	Algorithm 1: Logit-DP Input: Sensitivity bound B > 0, sensitivity constants G1, G2, L > 0, dataset D = {(xi, x i)}N i=1, batch size n, iteration limit T 1, stepsize η > 0, noise multiplier σ > 0, model Φ Output: Embedding model Φw T 1 Initialize weights w0 in Φ; 2 Compute gradient sensitivity C = (G1 + G2 + n L)B; 3 for t = 1, 2, ..., T 1 do 4 Sample batch X = {(x1, x 1), ..., (xn, x n)}; 5 for i, j = 1, ..., n do 6 Compute similarity gradients Zij X(wt) = wt S(Φwt(xi), Φwt(x j)); 7 Clip gradients to obtain Clip B( Zij X(wt)) = min n B Zij X (wt) , 1 o Zij X(wt); 9 Compute g using (5) Compute noisy gradient g = g + Y with Y N(0, σCIp); 10 Update the model wt+1 = wt η g;
Open Source Code	Yes	Our code is publicly available at https://github.com/google-research/google-research/ tree/master/logit_dp
Open Datasets	Yes	We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 ﬁnetuning tasks
Dataset Splits	Yes	All variants used the standard Adam optimizer for training and used the canonical 80-20 train-test split of the CIFAR10 dataset.
Hardware Specification	Yes	All models were trained on a single NVidia V100 GPU using a cloud computing platform with 512 GB of RAM.
Software Dependencies	No	The paper mentions "All variants used the standard Adam optimizer for training" but does not specify a version number for Adam or any other software dependencies.
Experiment Setup	Yes	The learning rates for Logit-DP, Naive-DP, and Non-Private were 10 2, 10 2, and 10 3, respectively, for the generic embedding net experiments and 10 4, 10 3, and 10 2, respectively, for the Res Net18 experiments. All variants used the standard Adam optimizer for training and used the canonical 80-20 train-test split of the CIFAR10 dataset. However, Logit-DP used 25 and 100 gradient accumulation steps for the generic embedding net and Res Net18 experiments, respectively. The batch size during training was 10, 000 and 1, 000 for the generic embedding net and Res Net18 experiments, respectively, and the entire testing dataset was used for evaluating test metrics. Moreover, each variant was run for 20 and 2 epochs over the entire training dataset for the generic embedding net and Res Net18 experiments in Table 1, respectively. For the DP variants, we ﬁxed the desired ℓ2 sensitivity to be 10 4 and 10 5 for Naive-DP and Logit-DP, respectively, in the generic embedding net experiments and 10 3 and 10 5, respectively, in the Res Net18 experiments. All DP methods chose a noise multiplier so that ε-DP was achieved for ϵ = 5.0. Finally, all hyperparameter tuning was done through a grid search of various learning rates (10 5, 10 4, ..., 10 2) and ℓ2 sensitivities (10 6, 10 5, ..., 10 0).