Simultaneous Private Learning of Multiple Concepts

Authors: Mark Bun, Kobbi Nissim, Uri Stemmer

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We investigate the direct-sum problem in the context of differentially private PAC learning: What is the sample complexity of solving k learning tasks simultaneously under differential privacy, and how does this cost compare to that of solving k learning tasks without privacy? In our setting, an individual example consists of a domain element x labeled by k unknown concepts (c1, . . . , ck). The goal of a multi-learner is to output k hypotheses (h1, . . . , hk) that generalize the input examples. Without concern for privacy, the sample complexity needed to simultaneously learn k concepts is essentially the same as needed for learning a single concept. Under differential privacy, the basic strategy of learning each hypothesis independently yields sample complexity that grows polynomially with k. For some concept classes, we give multi-learners that require fewer samples than the basic strategy. Unfortunately, however, we also give lower bounds showing that even for very simple concept classes, the sample cost of private multi-learning must grow polynomially in k. Keywords: Differential privacy, PAC learning, Agnostic learning, Direct-sum
Researcher Affiliation Academia Mark Bun EMAIL Department of Computer Science Boston University, Kobbi Nissim EMAIL Department of Computer Science Georgetown University, Uri Stemmer EMAIL Department of Computer Science Ben-Gurion University
Pseudocode Yes Algorithm 1 Adist Input: Privacy parameters ϵ, δ, database S X , sensitivity-1 quality function q... Algorithm 2 Query release for POINTX Input: Privacy parameters (ϵ, δ), database D Xn... Algorithm 3 Generic Learner Input: Concept class C, privacy parameters ϵ , ϵ, δ, and a k-labeled database S = (xi, yi,1, . . . , yi,k)n i=1... Algorithm 4 Parity Learner Input: Parameters ϵ, δ, and a k-labeled database S of size n = O(d βδ))... Algorithm 5 Point Learner Input: Privacy parameters ϵ, δ, and a k-labeled database S = (xi, yi,1, . . . , yi,k)n i=1.
Open Source Code No The paper focuses on theoretical analysis, definitions, theorems, and proofs. There is no mention of any code release, repository links, or supplementary materials containing code for the methodologies described.
Open Datasets No The paper is theoretical and discusses abstract concepts like 'domain X' and 'k-labeled database' for its analyses. It does not perform experiments on specific, concrete datasets, nor does it provide access information for any datasets.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with specific datasets. Therefore, there is no discussion or provision of dataset splits for training, validation, or testing.
Hardware Specification No This paper is purely theoretical, focusing on mathematical proofs, algorithms, and sample complexity analysis. It does not describe any experiments that would require specific hardware, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not include any experimental section or implementation details that would require listing specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and presents algorithms and proofs rather than empirical experiments. Consequently, it does not include details such as hyperparameter values, training configurations, or system-level settings for an experimental setup.