reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalized Linear Models in Non-interactive Local Differential Privacy with Public Data

Authors: Di Wang, Lijie Hu, Huanyu Zhang, Marco Gaboardi, Jinhui Xu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the eﬀectiveness of our algorithms through experiments on both synthetic and real-world datasets.
Researcher Affiliation	Collaboration	Di Wang EMAIL CEMSE King Abdullah University of Science and Technology Thuwal, Saudi Arabia Lijie Hu EMAIL CEMSE King Abdullah University of Science and Technology Thuwal, Saudi Arabia Huanyu Zhang EMAIL Meta New York, NY, USA Marco Gaboardi EMAIL Department of Computer Science Boston University Boston, MA 02215, USA Jinhui Xu EMAIL Department of Computer Science and Engineering University at Buﬀalo, SUNY Buﬀalo, NY 14260, USA
Pseudocode	Yes	Algorithm 1 Non-interactive LDP for smooth GLMs with public data (Gaussian) Algorithm 2 Non-interactive LDP for smooth GLMs with public data (General) Algorithm 3 Non-interactive LDP for smooth non-linear regression with public data (Gaussian) Algorithm 4 Non-interactive LDP for smooth non-linear regression with public data (General) Algorithm 5 2-round LDP for smooth GLMs with public data (Gaussian)
Open Source Code	No	The paper mentions using a 'Logistic Regression classiﬁer in the scikit-learn library (Pedregosa et al., 2011)' and 'standard gradient descent as the baseline method', but does not provide specific access to the authors' own implementation code for the methodology described.
Open Datasets	Yes	We conduct experiments on binary logistic regression for GLMs on the Covertype dataset (Dua and Graﬀ, 2017), the SUSY dataset (Baldi et al., 2014) and the Skin Segmentation dataset (Dua and Graﬀ, 2017).
Dataset Splits	Yes	We divide the data into training data and test data, where ntraining = 350, 000 and ntesting = 200, 000 (other data will be used as the public unlabeled data)... For the SUSY dataset...ntraining = 450, 000 and ntesting = 30, 000... For the Skin Segmentation dataset...ntraining = 180, 000 and ntesting = 5, 000.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only discusses experimental settings and software.
Software Dependencies	No	The paper mentions using the 'Logistic Regression classiﬁer in the scikit-learn library (Pedregosa et al., 2011)' but does not specify its version number or any other software dependencies with versions.
Experiment Setup	Yes	For privacy parameters, we will choose ϵ between 4 to 20 and set δ = 1 n1.1 . For dimension p, we choose from the set {5, 10, 15, 20, 25, 30, 40, 50, 60}. For diﬀerent experiments, we will vary diﬀerent private sample sizes n. However, we will always set the size of public unlabeled data m to be smaller than n. Speciﬁcally, without speciﬁcation, we will always set m = n p2 .