Privately Counting Partially Ordered Data
Authors: Matthew Joseph, Mónica Ribero, Alexander Yu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section evaluates our K-norm mechanism on a variety of poset structures . First, we derive a general result about the squared expected ℓ2 norm of ℓp balls (Section 4.1). Based on this result, the strongest comparison for our algorithm is the ℓ mechanism. We use this as the baseline for experiments on path posets, random posets (Section 4.3) and the National Health Interview Survey (Services & Medicaid, 2024) (Section 4.4). An evaluation of runtime appears in Section 4.5. |
| Researcher Affiliation | Industry | Matthew Joseph, M onica Ribero & Alexander Yu Google Research NY EMAIL. |
| Pseudocode | Yes | Pseudocode appears in Algorithm 1. Algorithm 1 Poset Ball Sampler 1: Input: Poset P satisfying Assumption 2.8 2: Uniformly sample an extended bipartition (N+, N , A, B) (Lemma 3.14) 3: Convert (N+, N , A, B) into its non-interfering chain C = (C+, C ) (Lemma 3.13) 4: Compute the vertices of the simplex F(C) = F(C+) F(C ) (Lemma 3.11) 5: Return a uniform sample from F(C) (Lemma 3.6) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The final set of experiments uses the National Health Interview Survey (NHIS) (Services & Medicaid, 2024). As described in the introduction, the survey includes or omits certain questions depending on previous answers. This induces a poset, and our experiments use either the first, first two, or first three sections of the survey (Hypertension, Cholesterol, and Asthma). The resulting posets have size from d = 4 to d = 15. As shown in Figure 6, our mechanism more roughly halves the error of the baseline mechanism. |
| Dataset Splits | No | The paper mentions running experiments for a certain number of trials (e.g., '100 trials', '10,000 trials') and generating random posets, but does not provide specific train/test/validation splits for a fixed dataset, nor does it refer to standard predefined splits for machine learning tasks. |
| Hardware Specification | No | On a 2 CPU machine with 32GB RAM, our method takes less than half a second for any of the d used in our experiments. While the memory amount (32GB RAM) and the number of CPUs (2 CPU) are mentioned, specific CPU model or processor types are not provided. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies, libraries, or solvers with version numbers. |
| Experiment Setup | No | The paper discusses the theoretical framework and experimental results, including comparisons of noise mechanisms, but does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or specific optimizer settings typically found in machine learning experiments. It mentions 'fixing privacy parameters' but does not state their values. |