reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Information Laundering for Model Privacy

Authors: Xinran Wang, Yu Xiang, Jun Gao, Jie Ding

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide some experimental studies to illustrate the concepts
Researcher Affiliation	Academia	Xinran Wang School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA EMAIL Yu Xiang Electrical and Computer Engineering University of Utah Salt Lake City, UT 84112, USA EMAIL Jun Gao Department of Mathematics Stanford University Stanford, CA 94305, USA EMAIL Jie Ding School of Statistics University of Minnesota-Twin Cities Minneapolis, MN 55455, USA EMAIL
Pseudocode	Yes	Algorithm 1 Optimized Information Laundering (OIL) and Algorithm 2 OIL-Y (a special case of Algorithm 1, in the matrix form)
Open Source Code	No	The paper does not provide a specific link or statement indicating that its source code is publicly available.
Open Datasets	Yes	In this experimental study, we use the 20-newsgroups dataset provided by scikit-learn opensource library (Scikit-learn, 2020d)... we use the life expectancy dataset provided by kaggle open-source data (Kaggle, 2020)... Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b)...
Dataset Splits	Yes	To evaluate the out-sample utility, we split the data into two parts using the default option provided in (Scikit-learn, 2020d), which results in a training part (2245 samples, 49914 features) and a testing part (1494 samples, 49914 features).
Hardware Specification	No	The paper does not specify the hardware used for experiments (e.g., CPU, GPU models, or memory).
Software Dependencies	No	The paper mentions using 'scikit-learn open-source library' but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	Alice trains a classiﬁer using the Naive Bayes method and records the frequency of observing each category [0.220.270.210.30] (r in Algorithm 2). Then, Alice runs the OIL-Y Algorithm (under a given β2) to obtain the transition probability matrix P [0, 1]4 4. In the regression model, we quantize the output alphabet Y by 30 points equally-spaced in between µ 3σ, where µ, σ represent the mean and the standard deviation of Y in the training data. In Figure 8(a), Alice uses half of the Breast Cancer dataset (Scikit-learn, 2020b) (standardized) to train a Logistic classiﬁcation model. In Figure 9, Alice used half of the simulated Moons dataset (Scikit-learn, 2020c) (with 1000 samples, 0.1 standard deviation for the noise) to train a Random Forest model.