reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures

Authors: Alina Ene, Alessandro Epasto, Vahab Mirrokni, Hoai-An Nguyen, Huy Nguyen, David Woodruff, Peilin Zhong

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work. Experimental Results. We illustrate the practicality of our fingerprinting algorithms by running experiments on two different datasets.
Researcher Affiliation	Collaboration	Alina Ene 1 Alessandro Epasto 2 Vahab Mirrokni 2 Hoai-An Nguyen 3 Huy L. Nguyen 4 David P. Woodruff 2 3 Peilin Zhong 2. 1Boston University 2Google Research 3Carnegie Mellon University 4Northeastern University.
Pseudocode	Yes	Algorithm 1 building-A (n d matrix A, ε (0, 1), k) ... Algorithm 2 max-coverage ... Algorithm 3 A (k, ε, δ) ... Algorithm 4 k-cover ... Algorithm 5 Max-Coverage-LS (n d matrix A, ε (0, 1), k) ... Algorithm 6 sketchy-submodular-maximization ... Algorithm 7 p-Tuples-Sketch (n 1 vector x, constant integer p 2, γ, δ (0, 1)) ... Algorithm 8 general-fingerprinting-sketch (n d matrix A, ε (0, 1), k 0)
Open Source Code	Yes	All experiments were run locally on a M2 Mac Book Air. The code can be found here.
Open Datasets	Yes	We use two publicly-available datasets, the UC Irvine Adult and US Census Data (1990) (Becker & Kohavi, 1996; Meek et al.). Aeberhard, S. and Forina, M. Wine. UCI Machine Learning Repository, 1991. DOI: https://doi.org/10.24432/C5PC7J. Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. Meek, C., Thiesson, B., and Heckerman, D. US Census Data (1990). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5VP42.
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training, validation, or test sets).
Hardware Specification	Yes	All experiments were run locally on a M2 Mac Book Air.
Software Dependencies	No	The paper mentions the implementation of a baseline (
Experiment Setup	Yes	The main variable we vary is the size of our L0 sketch, specifically with 300, 600, 900, and 1, 250 rows. Then, we ran k-means with 3 clusters (for the 3 wine types) using just the selected features.