Distribution Estimation under the Infinity Norm

Authors: Aryeh Kontorovich, Amichai Painsky

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our results in synthetic and real-world experiments. Finally, we apply our proposed framework to a basic selective inference problem, where we estimate the most frequent probabilities in a sample.
Researcher Affiliation Academia Aryeh Kontorovich EMAIL Department of Computer Science Ben Gurion University of the Negev Beer Sheva, Israel Amichai Painsky EMAIL School of Industrial Engineering Tel Aviv University Tel Aviv, Israel
Pseudocode No The paper describes algorithms and methods but does not present any structured pseudocode or algorithm blocks with specific steps.
Open Source Code No The paper does not contain any explicit statement about the release of source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We begin with a census data; we consider the 2000 United States Census (Bureau, 2014), which lists the frequency of the top 1000 most common last names in the United States.
Dataset Splits No The paper mentions "In each experiment we draw n samples from a distribution over an alphabet size A to evaluate the proposed bounds for a given confidence level 1 δ. We repeat this process 10^4 times and report the average bound and coverage rate...". This describes data sampling and repetition, but not specific training/test/validation splits for machine learning experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not specify any particular software libraries, frameworks, or their version numbers used in the experiments.
Experiment Setup Yes We set s = 1.1 throughout our experiments. In the first experiment we focus on A = 100 and δ = 0.05. We examine three bounds. First, we consider the bound from Theorem 2 with δ1 = 0.99δ, δ2 = 0.01δ. We set m to minimize (13) over the worst-case distribution (see (10)). This results in m = log(1/δ1) + 2 ≈ 8. Further, we examine Theorem 6 and the benchmark bound (1). ... Next, we examine the performance of our proposed schemes for a decaying confidence level, δ = 1/n^2. As above, we set A = 100 and focus on the two benchmark distributions. ... for a typical experiment of n = 10000