Simulation-based Bayesian Inference from Privacy Protected Data
Authors: Yifei Xiong, Nianqiao Ju, Sanguo Zhang
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms. |
| Researcher Affiliation | Academia | Yifei Xiong EMAIL Department of Statistics Purdue University Nianqiao Phyllis Ju EMAIL Department of Statistics Purdue University Sanguo Zhang EMAIL School of Mathematical Sciences University of Chinese Academy of Sciences |
| Pseudocode | Yes | Algorithm 1 Sequential private-data posterior estimation (SPPE) ... Algorithm 2 Sequential private-data likelihood estimation (SPLE) ... Algorithm 3 Sequential Monte Carlo Approximate Bayesian Computation (SMC-ABC) |
| Open Source Code | Yes | The code is available on Git Hub1. 1https://github.com/Yifei-Xiong/Simulation-based-Bayesian-Inference-from-Privacy-Protected-Data |
| Open Datasets | Yes | We apply our privacy mechanism and inference methods to several real infectious disease outbreaks: influenza, Ebola, and COVID-19. ... influenza outbreak. We utilized the dataset from a boarding school, obtained from https://search.r-project.org/CRAN/refmans/epimdr/html/flu.html. ... Ebola outbreak in West Africa, 2014. ... The dataset source is from https://apps.who.int/gho/data/node.ebola-sitrep. ... COVID-19. ... See https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/state/nevada/county/clark-county/. |
| Dataset Splits | Yes | In each round of training, we randomly select 5% of the newly generated samples as validation data. |
| Hardware Specification | Yes | Our numerical experiments were conducted on a computer equipped with four Ge Force RTX 2080 Ti graphics cards and a pair of 14-core Intel E5-2690 v4 CPUs. |
| Software Dependencies | No | The paper mentions 'Pytorch package in Python' but does not specify version numbers for either Python or Pytorch, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | We employed neural spline flows (Durkan et al., 2019) as the conditional density estimator, consisting of 8 layers. ... each layer consists of two residual blocks with 50 units and Re LU activation function, with 10 bins in each monotonic piecewise rational-quadratic transforms, and the tail bound was set to 5. ... In the training process, the number of samples simulated in each round is N = 1000 and there are R = 10 rounds in total. ... we stop training if the value of loss on validation data does not decrease after 20 epochs in a single round. For stochastic gradient descent optimizer, we choose the Adam (Kingma & Ba, 2014) with the batchsize of 100, the learning rate of 5 10 4 and the weight decay is 10 4. |