Distributed Differentially Private Data Analytics via Secure Sketching

Authors: Jakob Burkhardt, Hannah Keller, Claudio Orlandi, Chris Schwiegelshohn

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experimental Evaluation. In this section we present an empirical evaluation of differential privacy mechanisms for the low-rank approximation problem in the central, the local and LTM models. The local and LTM mechanisms we consider are algorithmically identical up to their noise rate. An immediate question thus is, for which parametrizations of the noise rate we still achieve acceptable utility. In addition we implement a Gamma/Laplace based LTM mechanism, to evaluate the price of achieving pure ε-differential privacy. Additionally, we experimentally evaluated the minimal computational overhead introduced by the use of MPC for distributing the execution of the LTM. In Appendix G we provide additional experiments for the ridge regression problem as well as a more in depth description of the low-rank approximation experiments.
Researcher Affiliation Academia 1Aarhus University, Denmark. Correspondence to: Hannah Keller <EMAIL>.
Pseudocode Yes Algorithm 1 Linear Transformation Model
Open Source Code No The paper states: "We implemented our approach in MP-SPDZ (Keller, 2020) and provide our results in Appendix G.1." This indicates the use of a third-party framework (MP-SPDZ) for implementation, not that the authors have open-sourced their specific code for the methodology described in this paper. There is no explicit statement of code release or a link to a repository for the paper's specific implementation.
Open Datasets Yes Datasets. We evaluate our mechanism on synthetic and real datasets, enabling us to vary the number of sampled points n and interpolating between datasets of different sizes. The synthetic datasets have a large spectral gap from the kth to the (k + 1)st singular value with random bases, which emphasizes the performance difference between the various private mechanisms. The parameters of the four real-world datasets from the UC Irvine Machine Learning Repository are given in Table 2. For more thorough descriptions on the datasets, see Appendix G.2. Table 2. Experimental evaluation of error ψ on real-world datasets. - Power (Hebrail & Berard, 2012) - Elevation (Kaul, 2013) - Ethylene (Fonollosa, 2015) - Songs (Bertin-Mahieux, 2011)
Dataset Splits No The paper describes the generation of synthetic datasets and mentions using real-world datasets, but it does not specify any training/test/validation splits. For example, for synthetic datasets, it states: "We generate synthetic data by first sampling every entry in A from N(0, 1)... We vary parameters... For all combinations of those parameters, we generate 17 synthetic datasets of sizes {1000i 1.5|i = 0, . . . , 16}." This describes dataset generation, not how the data is partitioned into splits for experiments.
Hardware Specification Yes Section G.1: "We implemented our approach in MP-SPDZ (Keller, 2020) and provide our results in Appendix G.1... using S servers from Amazon Web Services t3.large instances." Section G.2.1: "Experiments were executed on an Ubuntu 18.04 LTS machine, with an Intel Core i7-10510U CPU clocked at 1.8GHz and 16GB of RAM."
Software Dependencies Yes Software and Hardware All in the main body described mechanisms, are implemented in Python 3.6.9 making use of numpy. Experiments were executed on an Ubuntu 18.04 LTS machine, with an Intel Core i7-10510U CPU clocked at 1.8GHz and 16GB of RAM.
Experiment Setup Yes G.2.1. RIDGE REGRESSION...We vary parameters d {3, 10, 50}, ε {0.01, 0.03, 0.05, 0.1}, λ {1, 10, 100}, and µ2 {1, n, n2}. For all combinations of those parameters, we generate 17 synthetic datasets of sizes {1000i 1.5|i = 0, . . . , 16}. For each of those we then measure ϕ for the SSP mechanism and the Gaussian mechanism with p {1, 0.9, 0.8, 0.7, 0.6, 0.5, 0}, by running the algorithms 30 times per dataset and reporting the average ϕ. G.2.2. LOW-RANK APPROXIMATION...We vary parameters ε {0.05, 0.1, 0.5} and k {5, 10} for d = 50. For all combinations of parameters, we generate 17 synthetic datasets of sizes {1000i 1.5|i = 0, . . . , 16}. For each of those we then measure ψ for the central mechanism, the Laplace mechanism and the Gaussian mechanism with p {1, 0.9, 0.8, 0.7, 0.6, 0.5, 0}, by running the algorithms 20 times per dataset and reporting the average ψ.