Simulation-based Bayesian Inference from Privacy Protected Data

Authors: Yifei Xiong, Nianqiao Ju, Sanguo Zhang

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
Researcher Affiliation Academia Yifei Xiong EMAIL Department of Statistics Purdue University Nianqiao Phyllis Ju EMAIL Department of Statistics Purdue University Sanguo Zhang EMAIL School of Mathematical Sciences University of Chinese Academy of Sciences
Pseudocode Yes Algorithm 1 Sequential private-data posterior estimation (SPPE) ... Algorithm 2 Sequential private-data likelihood estimation (SPLE) ... Algorithm 3 Sequential Monte Carlo Approximate Bayesian Computation (SMC-ABC)
Open Source Code Yes The code is available on Git Hub1. 1https://github.com/Yifei-Xiong/Simulation-based-Bayesian-Inference-from-Privacy-Protected-Data
Open Datasets Yes We apply our privacy mechanism and inference methods to several real infectious disease outbreaks: influenza, Ebola, and COVID-19. ... influenza outbreak. We utilized the dataset from a boarding school, obtained from https://search.r-project.org/CRAN/refmans/epimdr/html/flu.html. ... Ebola outbreak in West Africa, 2014. ... The dataset source is from https://apps.who.int/gho/data/node.ebola-sitrep. ... COVID-19. ... See https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/state/nevada/county/clark-county/.
Dataset Splits Yes In each round of training, we randomly select 5% of the newly generated samples as validation data.
Hardware Specification Yes Our numerical experiments were conducted on a computer equipped with four Ge Force RTX 2080 Ti graphics cards and a pair of 14-core Intel E5-2690 v4 CPUs.
Software Dependencies No The paper mentions 'Pytorch package in Python' but does not specify version numbers for either Python or Pytorch, which are necessary for reproducible software dependencies.
Experiment Setup Yes We employed neural spline flows (Durkan et al., 2019) as the conditional density estimator, consisting of 8 layers. ... each layer consists of two residual blocks with 50 units and Re LU activation function, with 10 bins in each monotonic piecewise rational-quadratic transforms, and the tail bound was set to 5. ... In the training process, the number of samples simulated in each round is N = 1000 and there are R = 10 rounds in total. ... we stop training if the value of loss on validation data does not decrease after 20 epochs in a single round. For stochastic gradient descent optimizer, we choose the Adam (Kingma & Ba, 2014) with the batchsize of 100, the learning rate of 5 10 4 and the weight decay is 10 4.