Differentially Private Multivariate Medians

Authors: Kelly Ramsay, Aukosh Jagannath, Shoja'eddin Chenouri

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our results numerically using a Gaussian contamination model in dimensions up to d = 100, and compare them to a state-of-the-art private mean estimation algorithm. As a by-product of our work, we also provide a fast implementation of a modified version of the non-private integrated dual depth-based median. This implementation shows that this multivariate median can be computed quickly, even when the dimension is large. For example, we can compute the median of ten thousand 100-dimensional samples in less than one second on a personal computer.4 We call this modified version of the integrated dual depth the smoothed integrated dual depth, see Section 6.1 for more details. Our code is available on Git Hub. Aside from the private smoothed integrated dual depth median, we also computed the non-private smoothed integrated dual depth median and the private coin-press mean of Biswas et al. (2020). When computing the coin-press mean, we used the existing Git Hub implementation provided by the authors. The bounding ball for the coin-press algorithm had radius 10 d and was run for four iterations. We simulated fifty instances with n = 10, 000 over a range of dimensions from two to one hundred. The data were either generated from a standard Gaussian measure or contaminated standard Gaussian measure. In the contaminated model, 25% of the observations had mean (5, . . . , 5). We tested privacy parameters ϵ = 2, 5 and 10, the smoothing parameter s was fixed at 10 and π = N(0, 25d I). Figure 2 shows the empirical root mean squared error (ERMSE) for the different location estimators as the dimension increases when ϵ = 10. Results for the remaining values of ϵ can be seen in Appendix D. Notice that the private median ERMSE grows at a similar rate as that of the non-private ERMSE. The mean estimators perform very well in the uncontaminated setting, but poorly in the contaminated setting. In particular, the private median outperforms the coin-press mean. The median estimators do not perform as well as the mean estimators in the standard Gaussian setting, but do not become corrupted in the contaminated setting. This demonstrates the usual trade-offbetween accuracy and robustness between the mean and the median.
Researcher Affiliation Academia Kelly Ramsay EMAIL Department of Mathematics and Statistics York University North York, ON M3J1P3, Canada Aukosh Jagannath EMAIL Department of Statistics and Actuarial Science University of Waterloo Waterloo, ON N2L3G1, Canada Shoja eddin Chenouri EMAIL Department of Statistics and Actuarial Science University of Waterloo Waterloo, ON N2L3G1, Canada
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It describes methodologies in narrative text and refers to existing algorithms.
Open Source Code Yes Our code is available on Git Hub.
Open Datasets No The paper uses simulated data generated from "a standard Gaussian measure or contaminated standard Gaussian measure." It does not utilize or provide concrete access to a publicly available or open dataset.
Dataset Splits No The paper describes generating synthetic data for simulations ("We simulated fifty instances with n = 10, 000 over a range of dimensions from two to one hundred. The data were either generated from a standard Gaussian measure or contaminated standard Gaussian measure."), but it does not specify any training/test/validation dataset splits typically used for reproducing experiments on pre-existing datasets.
Hardware Specification Yes The code was run on a desktop computer with an Intel i7-8700K 3.70GHz chipset and an Nvidia GTX 1660 super graphics card.
Software Dependencies No The paper mentions using "Markov chain Monte Carlo (MCMC) methods" and an "existing Git Hub implementation provided by the authors" for the coin-press mean, but it does not provide specific version numbers for any software dependencies used in their own implementation.
Experiment Setup Yes The bounding ball for the coin-press algorithm had radius 10 d and was run for four iterations. We tested privacy parameters ϵ = 2, 5 and 10, the smoothing parameter s was fixed at 10 and π = N(0, 25d I).