The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli

Authors: Doron Cohen, Aryeh Kontorovich, Roi Weiss

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To support our theoretical findings, we present two sets of simulations. The first demonstrates the tightness of the lower bound in Theorem 2.2, while the second highlights a specific setting where the simple average estimator outperforms the Empirical Mean Estimator (EME), complementing the results of Theorem 2.3. ... Figure 2 shows the results. The empirical deviations (dashed lines) closely follow the theoretical bounds (solid lines), confirming the tightness of the lower bound in Theorem 2.2. As expected, larger values of J lead to smoother empirical curves, emphasizing the role of averaging in reducing variance. Notably, the empirical deviations converge to the theoretical decay rate as n grows. ... Figure 3. Error comparison between the EME and the simple average estimator for varying sample sizes n under different distributions: uniform, triangular, Beta(2,2), exponential, 1/n, and Gaussian.
Researcher Affiliation Academia 1Department of Computer Science, Ben-Gurion University of the Negev (BGU), Israel 2Department of Computer Science, Ariel University, Israel. Correspondence to: Doron Cohen <EMAIL>.
Pseudocode No The paper describes methods and proofs using mathematical notation and textual explanations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement regarding the availability of source code, nor does it include links to a repository or mention code in supplementary materials.
Open Datasets No The simulations in Section A use standard mathematical distributions (uniform, triangular, Beta(2,2), exponential, 1/n, and Gaussian) for generating data, but the paper does not specify any publicly available or open datasets that require concrete access information.
Dataset Splits No The paper's simulation section describes generating data from mathematical distributions (e.g., uniform, Gaussian) for empirical evaluation, but it does not utilize or define specific training, validation, or test splits for a named dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the simulations or other computations.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for the implementation or simulations.
Experiment Setup Yes We consider six values of q: q = 0.1, q = 0.2, q = 0.05, q = 0.01, q = 0.005, and q = 0.002. For each configuration, empirical results are averaged over J = 100, 1000, and 10000 repetitions to ensure stability. ... We evaluate the performance of the EME and the simple average estimator under six different distributions: uniform, triangular, Beta(2,2), exponential, 1/n-scaled, and Gaussian. For each distribution, we vary the number of trials k {10, 50, 100, 500} and compute the error as a function of the sample size n.