Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning
Authors: Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, Yu-Xiang Wang
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our experiments on real-life datasets demonstrate that PATE-ASQ achieves significantly better accuracy than standard PATE algorithms while incurring the same or lower privacy loss. |
| Researcher Affiliation | Academia | Chong Liu EMAIL Yuqing Zhu EMAIL Kamalika Chaudhuri EMAIL Yu-Xiang Wang EMAIL Department of Computer Science, University of California, Santa Barbara, CA 93106, USA Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA |
| Pseudocode | Yes | Algorithm 1 Standard PATE (Papernot et al., 2018), Algorithm 2 SVT-based PATE (Bassily et al., 2018b), Algorithm 3 PATE-PSQ, Algorithm 4 Disagreement-Based Active Learning (Hanneke, 2014), Algorithm 5 PATE-ASQ |
| Open Source Code | No | The authors would like to thank Songbai Yan for sharing their code with a practical implementation of the disagreement-based active learning algorithms from Yan et al. (2018). (This refers to code from Yan et al. (2018) not the authors' own implementation for this paper. No other statement regarding code release is found.) |
| Open Datasets | Yes | Datasets. We do our experiments on three binary classification datasets, mushroom, a9a, and real-sim. All of them are obtained from LIBSVM dataset website 6. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | Yes | For all datasets, 80% of all data points are randomly selected to be considered private and used to train teacher classifiers. 2% of all data points are randomly selected as public student unlabeled data points. The remaining 18% data points are reserved for testing. We repeat these random selection processes for 30 times. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instance types) are provided in the paper. |
| Software Dependencies | No | All privacy accounting and calibration are conducted via Auto DP (Wang et al., 2019), and the tight analytical calibration and composition of Gaussian mechanisms are due to (Balle and Wang, 2018). (Specific version numbers for Auto DP or other software components are not provided.) |
| Experiment Setup | Yes | Number of teachers K is set on all datasets so that each teacher classifier gets trained with approximately 100 data points. 30% of student unlabeled data points are set as the total budget of queries for PATE-ASQ and PATE-ASQ-NP. See Table 3. ϵ = 0.5, 1.0, 2.0 and δ = 1/n are set as privacy parameters for all datasets, where n is number of private teacher data points. |