Minimax Optimal Two-Sample Testing under Local Differential Privacy
Authors: Jongmin Mun, Seungwoo Kwak, Ilmun Kim
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical findings with extensive numerical experiments and demonstrate the practical relevance and effectiveness of our proposed methods. |
| Researcher Affiliation | Academia | Jongmin Mun EMAIL Data Sciences and Operations Department, Marshall School of Business University of Southern California Los Angeles, CA 90089, USA Seungwoo Kwak EMAIL Division of Future Convergence Sungkonghoe University Guro-gu, Seoul, 08359, Republic of Korea Ilmun Kim EMAIL Department of Mathematical Sciences Korea Advanced Institute of Science & Technology Yuseong-gu, Daejeon, 34141, Republic of Korea |
| Pseudocode | No | The paper describes algorithms and procedures in prose, such as the 'Permutation Testing Procedure' in Section 2.3 and 'Testing procedure' in Section 3.2, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | To facilitate adoption, we provide a Python package private AB that implements all proposed and baseline methods, available at https://pypi.org/project/private AB/0.0.2/. The code for replicating the numerical results is available at https://github.com/Jong-Min-Moon/optimal-local-dp-two-sample.git. |
| Open Datasets | No | The paper describes generating data for simulations (e.g., 'perturbed uniform distribution scenario' and 'mean differences between two d-dimensional Gaussian distributions') rather than using or providing access to pre-existing public datasets. |
| Dataset Splits | No | The paper conducts simulation studies where data is generated. It specifies parameters like 'equal sample sizes, denoted as n := n1 = n2', but there is no mention of splitting an external dataset into training, validation, or test sets. |
| Hardware Specification | No | The authors acknowledge the Center for Advanced Research Computing (CARC) at the University of Southern California for providing computing resources that have contributed to the research results reported within this publication. URL: https://carc.usc.edu. While computing resources are mentioned, specific hardware details such as GPU/CPU models or memory amounts are not provided. |
| Software Dependencies | No | The paper mentions providing a 'Python package private AB' but does not specify the version of Python or any other libraries with their respective version numbers. |
| Experiment Setup | Yes | In all simulation scenarios, we consider equal sample sizes, denoted as n := n1 = n2, and fix the significance level at γ = 0.05. We estimate the power by independently repeating the test 2000 times, and calculating the rejection ratio of the null hypothesis. The permutation procedure employs the Monte Carlo p-value given in (5) with B = 999. Our simulation results indicate that κ = 4 is a reasonable choice in most of the scenarios considered in this section, and thus we stick with the equal-sized binning scheme with κ = 4 for density testing. The simulation considers the following combinations of three parameters, namely the number of categories k, the perturbation size η and the privacy parameter α: (k, η, α) ∈ {(4, 0.04), (40, 0.015), (400, 0.002)} × {2, 1, 0.5}. For the location difference, we analyze scenarios involving mean differences between two d-dimensional Gaussian distributions... The dimension of the original data is chosen as d ∈ {3, 4, 5}. |