reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Much is Unseen Depends Chiefly on Information About the Seen

Authors: Seongmin Lee, Marcel Boehme

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, our search algorithm discovers estimators that have a substantially smaller MSE than the state-of-the-art Good Turing estimator. This holds for over 93% of runs when there are at least as many samples as classes. Our estimators MSE is roughly 80% of the Good-Turing estimator s.
Researcher Affiliation	Academia	Seongmin Lee and Marcel B ohme MPI for Security and Privacy, Germany EMAIL
Pseudocode	Yes	Algorithm 1 Genetic Algorithm Input: Target frequency k, Sample Xn, Iteration limit G, mutant size m 1: Population P0 = {r0} 2: Fitness f best = f0 = fitness(r0) 3: Limit GL = G 4: for g from 1 to GL do 5: P = select Top M(Pg 1, m) 6: P = lapply(P, mutate) 7: Pg = P {r0} select Top M(Pg 1, 3) 8: fg = min(lapply(Pg, fitness)) 9: if (g = GL) ((fg = f0) (f best > 0.95 fg)) then 10: GL = GL + G 11: f best = fg 12: Estimator ˆ M Evo k = instantiate(select Top M(PGL, 1)) Output: Minimal-MSE Estimator ˆ M Evo k
Open Source Code	Yes	Open Science and Replication. For scrutiny and replicability, we publish all our evaluation scripts at: https://github.com/ni Mgnoe See L/Unseen GA.
Open Datasets	Yes	Distibutions. We use the same six multinomial distributions that are used in previous evaluations (Orlitsky & Suresh, 2015; Orlitsky et al., 2016; Hao & Li, 2020): a uniform distribution (uniform), a half-and-half distribution where half of the elements have three times of the probability of the other half (half&half), two Zipf distributions with parameters s = 1 and s = 0.5 (zipf-1, zipf-0.5), and distributions generated by Dirichlet-1 prior and Dirichlet-0.5 prior (diri-1, diri-0.5, respectively). [...] We first use the Australian population-data-by-region dataset ((Arvidsson, 2023), S = 104) to estimate M0 for n = 50 random data points. [...] We also apply our method to the Shakespeare dataset (Sha) (S = 935, \|Datatset\| = 111, 396), commonly used in the literature (Efron & Thisted, 1976), focusing on missing mass for player frequency.
Dataset Splits	No	The paper describes repeating experiments 100 times by taking 100 different samples Xn of size n, and 10K times for extended samples. This describes a sampling strategy but does not provide specific train/test/validation dataset splits typically used for model training and evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers, such as programming languages or libraries.
Experiment Setup	Yes	E.1 HYPERPARAMETERS For evaluating Algorithm 1, we use the following hyperparameters: ... The number of generations G = 100. To avoid the algorithm from converging to a local minimum, we limit the maximum number of generations to be 2000. The mutant size m = 40. When selecting the individuals for the mutation, we use tournament selection with tournament size t = 3... To avoid the estimator from being too complex, we limit the maximum number of terms in the estimator to be 20.