reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Survey Design for Private Mean Estimation

Authors: Yu-Wei Chen, Raghu Pasupathy, Jordan Awan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We numerically illustrate our method through simulation studies. Section 5.1 compares compares variances between naive and DP-aware stratified sampling. Section 5.2 explores the interplay between the non-private and purely DP designs. Section 5.4 showcases the computational efficiency of our algorithm. The input of Algorithm 1, x , is obtained by package nloptr and alabama in R. All computations, including runtime measurements, were conducted on the Purdue Bell clusters using multiple cores. The source codes are available at https://github.com/garyUAchen/DP_Optim_Survey.
Researcher Affiliation	Academia	1Department of Statistics, Purdue University, West Lafayette IN, USA. Correspondence to: Jordan Awan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Integer-Optimal Design Input: x (the optimal continuous solution) and Hessian matrix of g : Hg(x ) for i = 1, . . . , k 1 do Define Ti = {ni N : x i ni x i } end for Define T = {(n1, . . . , nk 1, nk) : nk = η Pk 1 i=1 ni, where (n1, . . . , nk 1) T1 . . . Tk 1} Select ninit. = arg minn T g(n) Calculate the smallest eigenvalue λ of Hg(x ) Calculate radius r = p 2(g(ninit.) g(x ))/λ for i = 1, . . . , k 1 do Define Si = {ni N : max(x i r, 1) ni min(x i + r, Ni, η k + 1)} end for Define S = {(n1, . . . , nk 1, nk) : nk = η Pk 1 i=1 ni, where (n1, . . . , nk 1) S1 . . . Sk 1} Select n = arg minn S g(n) by an exhaustive search. Output: n
Open Source Code	Yes	The source codes are available at https://github.com/garyUAchen/DP_Optim_Survey.
Open Datasets	No	The paper describes simulation scenarios with synthetic parameters for population sizes and variances, such as: "In this simulation, there are 4 groups with population sizes N = (7000, 8000, 9000, 10000) and variance σ2 = (0.08, 0.082, 0.083, 0.084) and a total sample size η = 200." There is no mention of external public datasets or access information for any dataset.
Dataset Splits	No	The paper describes simulation setups using synthetic parameters, not a pre-existing dataset that would require splitting into training, validation, or test sets. Therefore, no dataset split information is provided.
Hardware Specification	No	All computations, including runtime measurements, were conducted on the Purdue Bell clusters using multiple cores. While a specific cluster name is mentioned, details such as the CPU model, exact number of cores, or memory specifications are not provided, which are necessary for a specific hardware description.
Software Dependencies	No	The input of Algorithm 1, x , is obtained by package nloptr and alabama in R. This indicates the use of R and specific packages (nloptr and alabama), but no version numbers for R or the packages are provided.
Experiment Setup	Yes	In this simulation, there are 4 groups with population sizes N = (7000, 8000, 9000, 10000) and variance σ2 = (0.08, 0.082, 0.083, 0.084) and a total sample size η = 200. We plot the variance ratio from a naive subsampling scheme to that of the integer-optimal design while varying ϵ from 0.01 to 100.