The Empirical Mean is Minimax Optimal for Local Glivenko-Cantelli
Authors: Doron Cohen, Aryeh Kontorovich, Roi Weiss
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To support our theoretical findings, we present two sets of simulations. The first demonstrates the tightness of the lower bound in Theorem 2.2, while the second highlights a specific setting where the simple average estimator outperforms the Empirical Mean Estimator (EME), complementing the results of Theorem 2.3. ... Figure 2 shows the results. The empirical deviations (dashed lines) closely follow the theoretical bounds (solid lines), confirming the tightness of the lower bound in Theorem 2.2. As expected, larger values of J lead to smoother empirical curves, emphasizing the role of averaging in reducing variance. Notably, the empirical deviations converge to the theoretical decay rate as n grows. ... Figure 3. Error comparison between the EME and the simple average estimator for varying sample sizes n under different distributions: uniform, triangular, Beta(2,2), exponential, 1/n, and Gaussian. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Ben-Gurion University of the Negev (BGU), Israel 2Department of Computer Science, Ariel University, Israel. Correspondence to: Doron Cohen <EMAIL>. |
| Pseudocode | No | The paper describes methods and proofs using mathematical notation and textual explanations, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement regarding the availability of source code, nor does it include links to a repository or mention code in supplementary materials. |
| Open Datasets | No | The simulations in Section A use standard mathematical distributions (uniform, triangular, Beta(2,2), exponential, 1/n, and Gaussian) for generating data, but the paper does not specify any publicly available or open datasets that require concrete access information. |
| Dataset Splits | No | The paper's simulation section describes generating data from mathematical distributions (e.g., uniform, Gaussian) for empirical evaluation, but it does not utilize or define specific training, validation, or test splits for a named dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the simulations or other computations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for the implementation or simulations. |
| Experiment Setup | Yes | We consider six values of q: q = 0.1, q = 0.2, q = 0.05, q = 0.01, q = 0.005, and q = 0.002. For each configuration, empirical results are averaged over J = 100, 1000, and 10000 repetitions to ensure stability. ... We evaluate the performance of the EME and the simple average estimator under six different distributions: uniform, triangular, Beta(2,2), exponential, 1/n-scaled, and Gaussian. For each distribution, we vary the number of trials k {10, 50, 100, 500} and compute the error as a function of the sample size n. |