Convergence Guarantees for the Good-Turing Estimator

Authors: Amichai Painsky

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An extensive empirical study which demonstrates the performance of the proposed estimator, compared to currently known schemes. The rest of the manuscript is organized as follows. Finally, in Section 8 we compare our suggested framework with currently known estimators in a series of synthetic and real-world experiments.
Researcher Affiliation Academia Amichai Painsky EMAIL Department of Industrial Engineering Tel Aviv University Tel Aviv, Israel
Pseudocode No The paper focuses on mathematical derivations, theorems, and proofs related to the Good-Turing estimator. It does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories.
Open Datasets Yes We begin with a corpus linguistic experiment. The popular Broadway play Hamilton consists of 20,520 words, of which m = 3,578 are distinct. Gao et al. (2007) considered the forearm skin biota of six subjects. Finally, we study census data. The lower row of Figure 5 considers the 2000 United States Census (Bureau, 2014), which lists the frequency of the top m = 1000 most common last names in the United States.
Dataset Splits No In each experiment we draw n samples, and compare the occupancy probabilities Mk(Xn) with their corresponding estimators, for different values of k. To attain an averaged error, we repeat each experiment 1000 times, and average the squared error. The paper describes a sampling and resampling evaluation methodology rather than traditional dataset splits for training, validation, and testing.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or other computer specifications.
Software Dependencies No The paper does not mention any specific software or library names along with their version numbers that would be necessary to replicate the experiments.
Experiment Setup No The paper describes the mathematical formulations of the estimators and analyzes their convergence rates. While it discusses sample sizes (n) and k values for evaluation, it does not specify hyperparameters, training configurations, or system-level settings typically found in experimental setups for machine learning models.