Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Generalized Linear Models in Non-interactive Local Differential Privacy with Public Data
Authors: Di Wang, Lijie Hu, Huanyu Zhang, Marco Gaboardi, Jinhui Xu
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets. |
| Researcher Affiliation | Collaboration | Di Wang EMAIL CEMSE King Abdullah University of Science and Technology Thuwal, Saudi Arabia Lijie Hu EMAIL CEMSE King Abdullah University of Science and Technology Thuwal, Saudi Arabia Huanyu Zhang EMAIL Meta New York, NY, USA Marco Gaboardi EMAIL Department of Computer Science Boston University Boston, MA 02215, USA Jinhui Xu EMAIL Department of Computer Science and Engineering University at Buffalo, SUNY Buffalo, NY 14260, USA |
| Pseudocode | Yes | Algorithm 1 Non-interactive LDP for smooth GLMs with public data (Gaussian) Algorithm 2 Non-interactive LDP for smooth GLMs with public data (General) Algorithm 3 Non-interactive LDP for smooth non-linear regression with public data (Gaussian) Algorithm 4 Non-interactive LDP for smooth non-linear regression with public data (General) Algorithm 5 2-round LDP for smooth GLMs with public data (Gaussian) |
| Open Source Code | No | The paper mentions using a 'Logistic Regression classifier in the scikit-learn library (Pedregosa et al., 2011)' and 'standard gradient descent as the baseline method', but does not provide specific access to the authors' own implementation code for the methodology described. |
| Open Datasets | Yes | We conduct experiments on binary logistic regression for GLMs on the Covertype dataset (Dua and Graff, 2017), the SUSY dataset (Baldi et al., 2014) and the Skin Segmentation dataset (Dua and Graff, 2017). |
| Dataset Splits | Yes | We divide the data into training data and test data, where ntraining = 350, 000 and ntesting = 200, 000 (other data will be used as the public unlabeled data)... For the SUSY dataset...ntraining = 450, 000 and ntesting = 30, 000... For the Skin Segmentation dataset...ntraining = 180, 000 and ntesting = 5, 000. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only discusses experimental settings and software. |
| Software Dependencies | No | The paper mentions using the 'Logistic Regression classifier in the scikit-learn library (Pedregosa et al., 2011)' but does not specify its version number or any other software dependencies with versions. |
| Experiment Setup | Yes | For privacy parameters, we will choose ϵ between 4 to 20 and set δ = 1 n1.1 . For dimension p, we choose from the set {5, 10, 15, 20, 25, 30, 40, 50, 60}. For different experiments, we will vary different private sample sizes n. However, we will always set the size of public unlabeled data m to be smaller than n. Specifically, without specification, we will always set m = n p2 . |