reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Minimax Optimal Two-Sample Testing under Local Differential Privacy

Authors: Jongmin Mun, Seungwoo Kwak, Ilmun Kim

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theoretical ﬁndings with extensive numerical experiments and demonstrate the practical relevance and eﬀectiveness of our proposed methods.
Researcher Affiliation	Academia	Jongmin Mun EMAIL Data Sciences and Operations Department, Marshall School of Business University of Southern California Los Angeles, CA 90089, USA Seungwoo Kwak EMAIL Division of Future Convergence Sungkonghoe University Guro-gu, Seoul, 08359, Republic of Korea Ilmun Kim EMAIL Department of Mathematical Sciences Korea Advanced Institute of Science & Technology Yuseong-gu, Daejeon, 34141, Republic of Korea
Pseudocode	No	The paper describes algorithms and procedures in prose, such as the 'Permutation Testing Procedure' in Section 2.3 and 'Testing procedure' in Section 3.2, but does not present them in a structured pseudocode or algorithm block.
Open Source Code	Yes	To facilitate adoption, we provide a Python package private AB that implements all proposed and baseline methods, available at https://pypi.org/project/private AB/0.0.2/. The code for replicating the numerical results is available at https://github.com/Jong-Min-Moon/optimal-local-dp-two-sample.git.
Open Datasets	No	The paper describes generating data for simulations (e.g., 'perturbed uniform distribution scenario' and 'mean differences between two d-dimensional Gaussian distributions') rather than using or providing access to pre-existing public datasets.
Dataset Splits	No	The paper conducts simulation studies where data is generated. It specifies parameters like 'equal sample sizes, denoted as n := n1 = n2', but there is no mention of splitting an external dataset into training, validation, or test sets.
Hardware Specification	No	The authors acknowledge the Center for Advanced Research Computing (CARC) at the University of Southern California for providing computing resources that have contributed to the research results reported within this publication. URL: https://carc.usc.edu. While computing resources are mentioned, specific hardware details such as GPU/CPU models or memory amounts are not provided.
Software Dependencies	No	The paper mentions providing a 'Python package private AB' but does not specify the version of Python or any other libraries with their respective version numbers.
Experiment Setup	Yes	In all simulation scenarios, we consider equal sample sizes, denoted as n := n1 = n2, and ﬁx the signiﬁcance level at γ = 0.05. We estimate the power by independently repeating the test 2000 times, and calculating the rejection ratio of the null hypothesis. The permutation procedure employs the Monte Carlo p-value given in (5) with B = 999. Our simulation results indicate that κ = 4 is a reasonable choice in most of the scenarios considered in this section, and thus we stick with the equal-sized binning scheme with κ = 4 for density testing. The simulation considers the following combinations of three parameters, namely the number of categories k, the perturbation size η and the privacy parameter α: (k, η, α) ∈ {(4, 0.04), (40, 0.015), (400, 0.002)} × {2, 1, 0.5}. For the location diﬀerence, we analyze scenarios involving mean diﬀerences between two d-dimensional Gaussian distributions... The dimension of the original data is chosen as d ∈ {3, 4, 5}.