Differentially Private Topological Data Analysis

Authors: Taegyu Kang, Sehwan Kim, Jinwon Sohn, Jordan Awan

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the performance of our privatized persistence diagrams through simulations as well as on a real data set tracking human movement.
Researcher Affiliation Academia Taegyu Kang EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA Sehwan Kim EMAIL Department of Population Medicine Harvard Medical School/Harvard Pilgrim Health Care Institute Boston, MA, 02215, USA Jinwon Sohn EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA Jordan Awan EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA
Pseudocode Yes Algorithm 1 MCMC implementation of the exponential mechanism Input: P0(D), P1(D), and a positive integer M Initialization: P(0) 0,DP, P(0) 1,DP Unif(Pers M) for i = 1, 2, . . . , do P 0 = P(t 1) 0,DP + N(0, σ2I2), P 1 = P(t 1) 1,DP + N(0, σ2I2) P = (P 0, P 1) p = min 0, ϵ 2 u D(P(t 1)) u D(P ) U Unif(0, 1) if log U p then P(t) 0,DP = P 0, P(t) 1,DP = P 1 else P(t) 0,DP = P(t 1) 0,DP , P(t) 1,DP = P(t 1) 1,DP end if end for
Open Source Code Yes The R code is available at https://github.com/jwsohn612/DPTDA.
Open Datasets Yes Data is provided at http://bertrand.michel.perso.math.cnrs.fr/Enseignements/TDA/Tuto-Part1.html
Dataset Splits No The paper describes simulation studies where
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments or simulations.
Software Dependencies No The paper mentions "The R code is available" but does not specify the version of R or any other software libraries with version numbers.
Experiment Setup Yes For the exponential mechanism, the default parameters are ϵ = 1, m = 0.2, n = 4000, and M = 5. These hyper-parameters were chosen by preliminary simulations. To sample from the exponential mechanism, we use a Markov chain Monte Carlo algorithm, specified in Appendix C.1. We choose the last iterate out of T = 10000 Monte Carlo diagrams as the reporting privatized diagram to be used for analysis . In Section 6, the parameters are: "We set the size of the sampling space M = 5, the resolution of the L1-DTM m = 0.05, the privacy budget ϵ = 1." and "We ... carry out 50000 iterations in the MCMC procedure in our mechanism"