Optimal Survey Design for Private Mean Estimation
Authors: Yu-Wei Chen, Raghu Pasupathy, Jordan Awan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We numerically illustrate our method through simulation studies. Section 5.1 compares compares variances between naive and DP-aware stratified sampling. Section 5.2 explores the interplay between the non-private and purely DP designs. Section 5.4 showcases the computational efficiency of our algorithm. The input of Algorithm 1, x , is obtained by package nloptr and alabama in R. All computations, including runtime measurements, were conducted on the Purdue Bell clusters using multiple cores. The source codes are available at https://github.com/garyUAchen/DP_Optim_Survey. |
| Researcher Affiliation | Academia | 1Department of Statistics, Purdue University, West Lafayette IN, USA. Correspondence to: Jordan Awan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Integer-Optimal Design Input: x (the optimal continuous solution) and Hessian matrix of g : Hg(x ) for i = 1, . . . , k 1 do Define Ti = {ni N : x i ni x i } end for Define T = {(n1, . . . , nk 1, nk) : nk = η Pk 1 i=1 ni, where (n1, . . . , nk 1) T1 . . . Tk 1} Select ninit. = arg minn T g(n) Calculate the smallest eigenvalue λ of Hg(x ) Calculate radius r = p 2(g(ninit.) g(x ))/λ for i = 1, . . . , k 1 do Define Si = {ni N : max(x i r, 1) ni min(x i + r, Ni, η k + 1)} end for Define S = {(n1, . . . , nk 1, nk) : nk = η Pk 1 i=1 ni, where (n1, . . . , nk 1) S1 . . . Sk 1} Select n = arg minn S g(n) by an exhaustive search. Output: n |
| Open Source Code | Yes | The source codes are available at https://github.com/garyUAchen/DP_Optim_Survey. |
| Open Datasets | No | The paper describes simulation scenarios with synthetic parameters for population sizes and variances, such as: "In this simulation, there are 4 groups with population sizes N = (7000, 8000, 9000, 10000) and variance σ2 = (0.08, 0.082, 0.083, 0.084) and a total sample size η = 200." There is no mention of external public datasets or access information for any dataset. |
| Dataset Splits | No | The paper describes simulation setups using synthetic parameters, not a pre-existing dataset that would require splitting into training, validation, or test sets. Therefore, no dataset split information is provided. |
| Hardware Specification | No | All computations, including runtime measurements, were conducted on the Purdue Bell clusters using multiple cores. While a specific cluster name is mentioned, details such as the CPU model, exact number of cores, or memory specifications are not provided, which are necessary for a specific hardware description. |
| Software Dependencies | No | The input of Algorithm 1, x , is obtained by package nloptr and alabama in R. This indicates the use of R and specific packages (nloptr and alabama), but no version numbers for R or the packages are provided. |
| Experiment Setup | Yes | In this simulation, there are 4 groups with population sizes N = (7000, 8000, 9000, 10000) and variance σ2 = (0.08, 0.082, 0.083, 0.084) and a total sample size η = 200. We plot the variance ratio from a naive subsampling scheme to that of the integer-optimal design while varying ϵ from 0.01 to 100. |