Consistent estimation of small masses in feature sampling

Authors: Fadhel Ayed, Marco Battiston, Federico Camerlenghi, Stefano Favaro

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper we study the problem of consistent estimation of the small mass Mn,r. We first show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass Mn,0. Then, we introduce an estimator of Mn,r and identify sufficient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator ˆ Mn,r of Mn,r which has the same analytic form of the celebrated Good Turing estimator for small probabilities, with the sole difference that the two estimators have different ranges (supports). Then, we show that ˆ Mn,r is strongly consistent, in the multiplicative sense, under the assumption that (pj)j 1 has regularly varying heavy tails.
Researcher Affiliation Academia Fadhel Ayed EMAIL Department of Statistics University of Oxford, 24-29 St Giles , OX1 3LB Oxford, United Kingdom. Marco Battiston EMAIL Department of Mathematics and Statistics Lancaster University, Fylde Ave, Bailrigg, LA1 4YR Lancaster, United Kingdom. Federico Camerlenghi EMAIL Department of Economics, Management and Statistics, University of Milano Bicocca, Piazza dell Ateneo Nuovo 1, 20126 Milano, Italy. Stefano Favaro EMAIL Department of Economics and Statistics University of Torino and Collegio Carlo Alberto Corso Unione Sovietica 218/bis, 10134, Torino, Italy.
Pseudocode No The paper contains mathematical derivations, definitions, theorems, and proofs (e.g., Theorem 1, Proposition 4, Corollary 6, Theorem 11, Lemma 13) but does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating the release of open-source code for the methodology described.
Open Datasets No The paper is theoretical and does not conduct experiments using specific datasets. It discusses theoretical frameworks like 'feature sampling' and 'species sampling' and applications in fields like 'genetics' but does not refer to any concrete, publicly available datasets used for empirical evaluation.
Dataset Splits No The paper is purely theoretical and does not involve empirical experiments with datasets, therefore, there is no information regarding dataset splits.
Hardware Specification No The paper focuses on theoretical contributions, including proofs and the development of estimators. It does not describe any empirical experiments, and thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not present any empirical experiments. Consequently, there are no software dependencies, including specific library or solver names with version numbers, mentioned for replication.
Experiment Setup No This paper is theoretical and focuses on mathematical derivations and proofs of estimators. It does not contain an experimental section or details regarding hyperparameters or system-level training settings.