Lightweight Protocols for Distributed Private Quantile Estimation
Authors: Anders Aamand, Fabrizio Boninsegna, Abigail Gentle, Jacob Imola, Rasmus Pagh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments In Section 5, we demonstrate that the algorithm in Theorem 1.1 performs favorably compared to known non-adaptive mechanisms as well as a more naive noisy binary search mechanism. |
| Researcher Affiliation | Academia | 1BARC and Department of Computer Science, University of Copenhagen, Denmark 2Department of Information Engineering, University of Padova, Italy 3School of Computer Science, University of Sydney, Australia. |
| Pseudocode | Yes | Algorithm 1 Baye SS main steps... Algorithm 2 Bayes Learn for empirical quantile estimation, from Algorithm 2 in (Gretta & Price, 2024)... Algorithm 3 Shuffled Noisy Binary Search for quantile estimation... Algorithm 4 DPBaye SS for empirical quantile estimation with local differential privacy, from Algorithm 3 in (Gretta & Price, 2024) |
| Open Source Code | Yes | Our code is freely available 5. https://github.com/Nynsen Faber/Quantile_estimation_with_adaptive_LDP |
| Open Datasets | No | Data Generation The income dataset was generated using a Pareto distribution p(x) 1 xγ+1 , a well studied distribution to model income data (Arnold, 2014). We generate n = {2500, 5000, 7500} positive integers by sampling from the continuous Pareto distribution with shape γ = 1.5 and multiplicative factor 2000 and then rounding them. For different coin domains [B] we clip the dataset to get integer values in [B]. To compare Dp Baye SS and Dp Naive NBS across various coin domains [B] for a fixed privacy budget, we generated n = 2500 integers by sampling from a uniform distribution over a random interval within [B]. |
| Dataset Splits | No | The paper describes how synthetic data was generated for experiments and that each algorithm was executed 200 times to compute empirical cumulative distribution of the absolute quantile error. However, it does not specify explicit training, validation, or test dataset splits in the context of machine learning model development. The focus is on evaluating algorithm performance on generated data, not on training/testing a model with distinct splits. |
| Hardware Specification | Yes | All experiments were carried out using an Intel Xeon Processor W-2245 (8 cores, 3.9GHz), 128GB RAM, Ubuntu 20.04.3, and Python 3.11. |
| Software Dependencies | No | The paper mentions 'Python 3.11' as part of the hardware setup, but does not list multiple key software components with their specific version numbers (e.g., libraries, frameworks) to meet the requirement for reproducible software dependencies. |
| Experiment Setup | Yes | I.1. Hyper-Parameter Selection To determine the optimal parameter for updating Dp Bayes Learn given a fixed number of users n, coins B, and varying privacy budgets ε {0.5, 1, 1.5}, we conducted experiments using Dp Baye SS with different update parameters αupdate = c q log B n . These experiments were performed on two distinct datasets generated by sampling from a Pareto distribution, the outcomes are illustrated in Figure 2. By analyzing different αtest values and the error distribution across various privacy budgets, we observed that the algorithm performs poorly at c = 0.1. However, within the range c [0.3, 0.7], the performance stabilizes and the precision decreases for c > 0.7. Therefore, an effective range for the parameter c is [0.3, 0.7]. Based on this analysis, we chose to use c = 0.6. |