reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistent estimation of small masses in feature sampling

Authors: Fadhel Ayed, Marco Battiston, Federico Camerlenghi, Stefano Favaro

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper we study the problem of consistent estimation of the small mass Mn,r. We ﬁrst show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass Mn,0. Then, we introduce an estimator of Mn,r and identify suﬃcient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator ˆ Mn,r of Mn,r which has the same analytic form of the celebrated Good Turing estimator for small probabilities, with the sole diﬀerence that the two estimators have diﬀerent ranges (supports). Then, we show that ˆ Mn,r is strongly consistent, in the multiplicative sense, under the assumption that (pj)j 1 has regularly varying heavy tails.
Researcher Affiliation	Academia	Fadhel Ayed EMAIL Department of Statistics University of Oxford, 24-29 St Giles , OX1 3LB Oxford, United Kingdom. Marco Battiston EMAIL Department of Mathematics and Statistics Lancaster University, Fylde Ave, Bailrigg, LA1 4YR Lancaster, United Kingdom. Federico Camerlenghi EMAIL Department of Economics, Management and Statistics, University of Milano Bicocca, Piazza dell Ateneo Nuovo 1, 20126 Milano, Italy. Stefano Favaro EMAIL Department of Economics and Statistics University of Torino and Collegio Carlo Alberto Corso Unione Sovietica 218/bis, 10134, Torino, Italy.
Pseudocode	No	The paper contains mathematical derivations, definitions, theorems, and proofs (e.g., Theorem 1, Proposition 4, Corollary 6, Theorem 11, Lemma 13) but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not conduct experiments using specific datasets. It discusses theoretical frameworks like 'feature sampling' and 'species sampling' and applications in fields like 'genetics' but does not refer to any concrete, publicly available datasets used for empirical evaluation.
Dataset Splits	No	The paper is purely theoretical and does not involve empirical experiments with datasets, therefore, there is no information regarding dataset splits.
Hardware Specification	No	The paper focuses on theoretical contributions, including proofs and the development of estimators. It does not describe any empirical experiments, and thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not present any empirical experiments. Consequently, there are no software dependencies, including specific library or solver names with version numbers, mentioned for replication.
Experiment Setup	No	This paper is theoretical and focuses on mathematical derivations and proofs of estimators. It does not contain an experimental section or details regarding hyperparameters or system-level training settings.