Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
PriViT: Vision Transformers for Private Inference
Authors: Naren Dhyani, Jianqiao Cambridge Mo, Patrick Yubeaton, Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Pri Vi T either improves upon (or is competitive with) MPCVi T both in terms of latency and accuracy on Tiny Imagenet as well as CIFAR 10/100. Therefore, Pri Vi T presents a new algorithmic approach for designing privacy-friendly vision transformers; see Table 2 for quantitative comparisons. Table 2: Accuracy-latency tradeoffs between Pri Vit and MPCVi T. All latencies are calculated with the Secretflow (Ma et al., 2023) framework using the SEMI2k (Cramer et al., 2018) protocol. Detailed methodology is reported in Appendix A. Left: Comparison of Pri Vit versus MPCVi T on Tiny Imagenet. Pri Vi T achieves 5.77 speedup for an isoaccuracy of approximately 64%. Right: Comparison of Pri Vit versus MPCVi T on Cifar10. Pri Vit achieves 1.14 speedup for an isoaccuracy of approximately 94%. Bottom: Comparison of Pri Vit versus MPCVi T on Cifar100. Pri Vit achieves 1.05 speedup for an isoaccuracy of approximately 78%. 4.2 Comparisons on standard benchmarks We benchmark Pri Vi T against MPCVi T, using the checkpoints publicly shared by the authors of Zeng et al. (2022). We calculate latencies with the Secretflow (Ma et al., 2023) framework using the SEMI2k (Cramer et al., 2018) protocol. |
| Researcher Affiliation | Academia | Naren Dhyani EMAIL New York University Jianqiao Mo EMAIL New York University Patrick Yubeaton EMAIL New York University Minsu Cho EMAIL New York University Ameya Joshi EMAIL New York University Siddharth Garg EMAIL New York University Brandon Reagen EMAIL New York University Chinmay Hegde EMAIL New York University |
| Pseudocode | No | The paper describes the method conceptually and provides an overview in Figure 15, but it does not present structured pseudocode or an algorithm block labeled as such. For example, Section 3.1 describes 'Switched Taylorization' and Section 3.2 describes 'Training Pri Vi T models' in prose. |
| Open Source Code | No | We commit to releasing all code and data needed to reproduce our results post-peer review. |
| Open Datasets | Yes | We apply Pri Vi T algorithm to a pretrained checkpoint of Vi T-Tiny (Steiner et al., 2021) that is trained on Image Net-21k (14 million images, 21,843 classes) at resolution 224 224, and fine-tuned on Image Net 2012 (1 million images, 1,000 classes) at resolution 224 224. The pretrained Vi T Tiny checkpoints are made available by (Win Kawaks, 2022). In this research work we focus on finetuning an existing model checkpoint like Vi T Tiny on a target standard image classification dataset (CIFAR-10/100 (Krizhevsky et al., 2009) and Tiny-Image Net). CIFAR-10/100 has images of size 32 32 while Tiny-Image Net has 64 64. |
| Dataset Splits | Yes | CIFAR-10 has 10 classes with 5000 training images and 1000 test images per class. CIFAR-100 has 100 classes with 500 training images and 100 test images per class. Tiny-Image Net has 200 classes with 500 training images and 50 test images per class. |
| Hardware Specification | No | The paper mentions that "All latencies are calculated with the Secretflow (Ma et al., 2023) framework using the SEMI2k (Cramer et al., 2018) protocol" and "We use the EMP Toolkit (Wang et al., 2016), a widely used GC framework, to generate GC circuits for nonlinear functions." However, it does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions several software components like "Secretflow (Ma et al., 2023) framework", "SEMI2k (Cramer et al., 2018) protocol", "EMP Toolkit (Wang et al., 2016)", "Adam W (Loshchilov & Hutter, 2017)", and "Adam (Kingma & Ba, 2014)". While these are software, specific version numbers for libraries like Python, PyTorch, CUDA, etc., are not provided. |
| Experiment Setup | Yes | Vi T teacher pretraining. As the base model, we finetune a pretrained Vi T-Tiny on CIFAR-10/100 for 10 epochs. We use Adam W (Loshchilov & Hutter, 2017) as the optimizer with an initial learning rate and weight decay as 0.0001 and 0.0001 respectively, and decay the learning rate after every 30 epochs by factor 0.1. We use the same hyperparameters for the Tiny Imagenet model. Joint optimization of student Vi T and parametric non linearities. We use Adam (Kingma & Ba, 2014) optimizer with learning rate equal to 0.0001. We use knowledge distillation and use soft labels generated by the teacher model with a temperature of 4. The total loss is then, L = LPri Vi T +LKL, where LPri Vi T is Equation 6 and LKL is the KL divergence loss between the logits of teacher and student model. The Lasso coefficient (Tibshirani, 1996) for parametric attention and GELU mask are set to λg = 0.00003 and λs = 0.00003 respectively at the beginning of the search. We set warmup epochs to 5 during which we don t change any hyperparameters of the model. Post warmup, we increment λg by a multiplicative factor of 1.1 at the end of each epoch if the number of active GELUs of current epoch do not decrease by atleast 2 as compared to previous epoch. Note that a GELU/softmax is considered active if it s corresponding auxiliary variable is greater than threshold hyperparameter ϵ = 0.001. Binarizing parametric nonlinearities, finetuning. When the GELUs and softmax budgets are satisfied, we binarize and freeze the the GELU and softmax auxiliary variables. We subsequently finetune the model for 50 epochs using Adam W with a learning rate 0.0001, weight decay 0.0001 and a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016). Our finetuning approach continues to use knowledge distillation as before. |