reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Large Scale Transfer Learning for Differentially Private Image Classification

Authors: Harsh Mehta, Abhradeep Guha Thakurta, Alexey Kurakin, Ashok Cutkosky

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we zoom in on the Image Net dataset and demonstrate that, similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the models are ﬁnetuned privately. Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we ﬁnd that similar to the non-private setting, the choice of optimizer can further improve performance substantially with DP. By using the LAMB optimizer, we saw improvement of up to 20% points (absolute). We also show that ﬁnetuning just the last layer for a single step in the full batch setting, combined with extremely small-scale (near-zero) initialization leads to both SOTA results of 81.7 % under a wide privacy budget range of ε [4, 10] and δ = 10 6 while minimizing the computational overhead substantially. Finally, we present additional results on CIFAR-10 and CIFAR-100, surpassing previous state of the art by leveraging transfer learning with our recommendations.
Researcher Affiliation	Collaboration	Harsh Mehta EMAIL Google Research Abhradeep Thakurta EMAIL Google Research Alexey Kurakin EMAIL Google Research Ashok Cutkosky EMAIL Boston University
Pseudocode	Yes	A Algorithmic details We present below a generalized version of DP-SGD where the gradients get processed in the traditional DP-SGD fashion and are then passed to a ﬁrst order optimizer as an input. This lets us instantiate DP versions of well known optimizers like SGD, Momentum, Adam and LAMB. We prepend the optimizer s name with DP to denote that the gradients were ﬁrst processed as shown in Algorithm 1 and then passed to the said optimizer. Algorithm 1 Generalized First Order Diﬀerentially Private Algorithm
Open Source Code	Yes	Code: https://github.com/google-research/google-research/tree/master/dp_transfer
Open Datasets	Yes	Datasets. We use the ILSVRC-2012 Image Net dataset (Deng et al., 2009) with 1k classes and 1.3M images (we refer to it as Image Net in what follows) as our ﬁnal evaluation dataset. However, we provide supplementary results in Section F where evaluate on 2 additional datasets, namely CIFAR-10 and CIFAR-100.
Dataset Splits	No	We use the ILSVRC-2012 Image Net dataset (Deng et al., 2009) with 1k classes and 1.3M images (we refer to it as Image Net in what follows) as our ﬁnal evaluation dataset. However, we provide supplementary results in Section F where evaluate on 2 additional datasets, namely CIFAR-10 and CIFAR-100. We ﬁnetune on Image Net train split and present the Top-1 accuracies we obtain from the oﬃcial test split. The paper implies standard splits but does not provide explicit percentages, counts, or a citation detailing the splits used.
Hardware Specification	Yes	Finally, we conduct our experiments on TPUv4 architecture. All our models were pre-trained using TPUv4 hardware with exact amounts depending on the model. All models were trained using 64 TPUv4 cores.
Software Dependencies	No	Our implementation relies on Tensorﬂow Privacy 1 codebase for conversion of (ε, δ) and clipping norm C to/from noise multiplier σ. We conduct all our experiment using Scenic library (Dehghani et al., 2021) for high quality reproducible implementations of both Res Net (Bi T) and Vision Transformers. Scenic, in turn, uses Flax (Heek et al., 2020) for may of the layer deﬁnitions. For the privacy accounting, we rely on the default Rényi accountant implementation already open-sourced as part of Tensorﬂow Privacy library. No specific version numbers are provided for these software dependencies (Tensorflow Privacy, Jax, Scenic, Flax).
Experiment Setup	Yes	At the pre-training stage, we stick with the common practice of employing Adam optimizer (even for Res Net) (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999, with a batch size of 4096 and high weight decay of 0.1 unless mentioned otherwise. We train with sigmoid cross-entropy loss and use linear learning rate warmup until 10k steps, followed by linear decay until the end of training. For our private ﬁnetuning experiments, we stick with a reasonably stringent privacy guarantee of ε = 10 and δ = 10 6, unless speciﬁed otherwise. We use DP-SGD privacy analysis to compute the noise multiplier. To limit other confounding factors we set the clipping norm C to 1. Also, since training with DP-SGD is computationally expensive, we ﬁnetune on Image Net for at most 10 epochs. Finally, when training the last layer with DP we found it to be crucial to initialize the last layer weights to zero (or a small value). (Section 4 Training details) Additionally, Tables 7 and 8 provide detailed finetuning hyperparameters, and Section E.3 describes the setup for Figure 1b, including single-step, full-batch training, zero initialization, and specific input resolutions.