Privacy-Preserving Energy-Based Generative Models for Marginal Distribution Protection
Authors: Robert E. Tillman, Tucker Balch, Manuela Veloso
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this approach using financial and healthcare datasets and demonstrate that the resulting learnt generative models produce high fidelity synthetic data while preserving privacy. We also show that PPEMs can incorporate both α-LMDP and DP in contexts where both forms of privacy are required. ... Using credit card data and electronic healthcare records, we empirically demonstrate that PPEMs produce high fidelity synthetic data while preserving privacy. |
| Researcher Affiliation | Industry | Robert E. Tillman Optum AI Labs (United Health Group) Tucker Balch J.P. Morgan Chase AI Research Manuela Veloso J.P. Morgan Chase AI Research |
| Pseudocode | Yes | Pseudocode for training and sampling is provided in Appendix A and proofs are provided in Appendix B. Code is also provided in the attached supplement. |
| Open Source Code | Yes | Pseudocode for training and sampling is provided in Appendix A and proofs are provided in Appendix B. Code is also provided in the attached supplement. |
| Open Datasets | Yes | We next apply PPEMs to real financial and healthcare datasets that have previously been used to benchmark privacy-preserving generative models: the Kaggle credit card fraud dataset (Pozzolo et al., 2015), used as the primary evaluation dataset for PATE-GAN, consists of 28 factors used to predict whether a transaction is fraudulent and the transaction amount; the MIMIC-III critical care electronic healthcare record (EHR) dataset (Johnson et al., 2016) consists of binary indicators for diagnoses patients received. ... The license for this dataset is available at https://opendatacommons.org/licenses/dbcl/1-0/. ... The license for this dataset is available at https://physionet.org/content/mimiciii/view-license/1.4/. |
| Dataset Splits | No | The paper does not explicitly provide information on dataset splits (e.g., training, validation, test percentages or counts). It only mentions using a minibatch size of 128 for training. |
| Hardware Specification | Yes | All experiments were run using a single NVIDIA T4 GPU. |
| Software Dependencies | No | The paper mentions using a public implementation for DP-GAN and PATE-GAN, but does not list specific software dependencies (e.g., Python, PyTorch, CUDA versions) used for their own PPEM models. |
| Experiment Setup | Yes | D.1 Hyperparameters Below are the hyperparameters used in all experiments with PPEM models: m = 10 λα = 10 λD = 1 Training epochs = 100 Minibatch size = 128 Number of energy model iterations per generator iteration = 5 MLP layer dimensions (all networks) = 256 Latent dimensions (both generators) = 128 Generators learning rate = 5e-4 Energy models learning rate = 1e-3 α-level = 0.05 (ϵ, δ) = (1, n-1) |