DeepReShape: Redesigning Neural Networks for Efficient Private Inference

Authors: Nandan Kumar Jha, Brandon Reagen

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Deep Re Shape using standard PI benchmarks and demonstrate a 2.1% accuracy gain with a 5.2 runtime improvement at iso-Re LU on CIFAR-100 and an 8.7 runtime improvement at iso-accuracy on Tiny Image Net. Furthermore, we investigate the significance of network selection in prior Re LU optimizations and shed light on the key network attributes for superior PI performance.
Researcher Affiliation Academia Nandan Kumar Jha EMAIL New York University Brandon Reagen EMAIL New York University
Pseudocode Yes Algorithm 1 Re LU equalization Input: Network Net with stages S1,...,SD; C a sorted list of most to least critical stage; stage-compute ratio ϕ1,...,ϕD; and stagewise channel multiplication factors λ1,..., λ(D 1). Output: Re LU-equalized versions of network Net. ... Algorithm 2 Re LU optimization steps employed in Hyb Re Nets(HRNs) Input: A network Net with D stages S1, S2, ..., SD and C, a sorted list of stages from least to most critical Output: Re LU optimized versions of Net
Open Source Code No The paper discusses the use of a third-party tool, Microsoft SEAL (release 4.0, with a GitHub link), but does not provide any explicit statement or link indicating that the source code for the methodology described in this paper (Deep Re Shape, Hyb Re Nets) is publicly available. There is no mention of the authors' own code being released.
Open Datasets Yes We performed our experiments on CIFAR-100 (Krizhevsky et al., 2010) and Tiny Image Net (Le & Yang, 2015; Yao & Miller, 2015), as prior PI-specific network optimization studies (Jha et al., 2021; Cho et al., 2022b; Kundu et al., 2023) used these datasets.
Dataset Splits Yes CIFAR-100 consists of 100 classes, each with 500 training images and 100 test images of resolution 32 32. Tiny Image Net includes 200 classes, each with 500 training images and 50 validation images with a resolution of 64 64.
Hardware Specification Yes Our experimental setup involved an AMD EPYC 7502 server with 2.5 GHz, 32 cores, and 256 GB RAM. The client and server were simulated as two separate processes operating on the same machine. We set the number of threads to four to compute the GC latency.
Software Dependencies Yes Specifically, we used Microsoft SEAL to compute the homomorphic encryption (HE) latency for convolution and fully connected operations, and DELPHI (Mishra et al., 2020) to compute the garbled-circuit (GC) latency for Re LU operations. ... Microsoft SEAL (release 4.0). https://github.com/Microsoft/SEAL, March 2022. Microsoft Research, Redmond, WA.
Experiment Setup Yes For training on CIFAR-100 and Tiny Image Net, we used a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016) with an initial learning rate of 0.1, a mini-batch size of 128, a momentum of 0.9, and a weight decay factor of 5e-4. We trained the networks for 200 epochs on both datasets, with an additional 20 warmup epochs for Decoupled KD (Zhao et al., 2022). For Deep Re Duce and KD experiments in Tables 6, Table 13, and Table 14, we employed Hinton s knowledge distillation (Hinton et al., 2015) with a temperature of 4 and a relative weight to cross-entropy loss as 0.9.