Convergence Guarantees for the Good-Turing Estimator
Authors: Amichai Painsky
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An extensive empirical study which demonstrates the performance of the proposed estimator, compared to currently known schemes. The rest of the manuscript is organized as follows. Finally, in Section 8 we compare our suggested framework with currently known estimators in a series of synthetic and real-world experiments. |
| Researcher Affiliation | Academia | Amichai Painsky EMAIL Department of Industrial Engineering Tel Aviv University Tel Aviv, Israel |
| Pseudocode | No | The paper focuses on mathematical derivations, theorems, and proofs related to the Good-Turing estimator. It does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories. |
| Open Datasets | Yes | We begin with a corpus linguistic experiment. The popular Broadway play Hamilton consists of 20,520 words, of which m = 3,578 are distinct. Gao et al. (2007) considered the forearm skin biota of six subjects. Finally, we study census data. The lower row of Figure 5 considers the 2000 United States Census (Bureau, 2014), which lists the frequency of the top m = 1000 most common last names in the United States. |
| Dataset Splits | No | In each experiment we draw n samples, and compare the occupancy probabilities Mk(Xn) with their corresponding estimators, for diļ¬erent values of k. To attain an averaged error, we repeat each experiment 1000 times, and average the squared error. The paper describes a sampling and resampling evaluation methodology rather than traditional dataset splits for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or other computer specifications. |
| Software Dependencies | No | The paper does not mention any specific software or library names along with their version numbers that would be necessary to replicate the experiments. |
| Experiment Setup | No | The paper describes the mathematical formulations of the estimators and analyzes their convergence rates. While it discusses sample sizes (n) and k values for evaluation, it does not specify hyperparameters, training configurations, or system-level settings typically found in experimental setups for machine learning models. |