Information Laundering for Model Privacy
Authors: Xinran Wang, Yu Xiang, Jun Gao, Jie Ding
ICLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide some experimental studies to illustrate the concepts |
| Researcher Affiliation | Academia | Xinran Wang School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA EMAIL Yu Xiang Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA EMAIL Jun Gao Department of Mathematics Stanford University Stanford, CA 94305, USA EMAIL Jie Ding School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Optimized Information Laundering (OIL) and Algorithm 2 OIL-Y (a special case of Algorithm 1, in the matrix form) |
| Open Source Code | No | The paper does not provide a specific link or statement indicating that its source code is publicly available. |
| Open Datasets | Yes | In this experimental study, we use the 20-newsgroups dataset provided by scikit-learn opensource library (Scikit-learn, 2020d)... we use the life expectancy dataset provided by kaggle open-source data (Kaggle, 2020)... Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b)... |
| Dataset Splits | Yes | To evaluate the out-sample utility, we split the data into two parts using the default option provided in (Scikit-learn, 2020d), which results in a training part (2245 samples, 49914 features) and a testing part (1494 samples, 49914 features). |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or memory). |
| Software Dependencies | No | The paper mentions using 'scikit-learn open-source library' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Alice trains a classifier using the Naive Bayes method and records the frequency of observing each category [0.220.270.210.30] (r in Algorithm 2). Then, Alice runs the OIL-Y Algorithm (under a given β2) to obtain the transition probability matrix P [0, 1]4 4. In the regression model, we quantize the output alphabet Y by 30 points equally-spaced in between µ 3σ, where µ, σ represent the mean and the standard deviation of Y in the training data. In Figure 8(a), Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b) (standardized) to train a Logistic classification model. In Figure 9, Alice used half of the simulated Moons dataset (Scikit-learn, 2020c) (with 1000 samples, 0.1 standard deviation for the noise) to train a Random Forest model. |