Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures

Authors: Alina Ene, Alessandro Epasto, Vahab Mirrokni, Hoai-An Nguyen, Huy Nguyen, David Woodruff, Peilin Zhong

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work. Experimental Results. We illustrate the practicality of our fingerprinting algorithms by running experiments on two different datasets.
Researcher Affiliation Collaboration Alina Ene 1 Alessandro Epasto 2 Vahab Mirrokni 2 Hoai-An Nguyen 3 Huy L. Nguyen 4 David P. Woodruff 2 3 Peilin Zhong 2. 1Boston University 2Google Research 3Carnegie Mellon University 4Northeastern University.
Pseudocode Yes Algorithm 1 building-A (n d matrix A, ε (0, 1), k) ... Algorithm 2 max-coverage ... Algorithm 3 A (k, ε, δ) ... Algorithm 4 k-cover ... Algorithm 5 Max-Coverage-LS (n d matrix A, ε (0, 1), k) ... Algorithm 6 sketchy-submodular-maximization ... Algorithm 7 p-Tuples-Sketch (n 1 vector x, constant integer p 2, γ, δ (0, 1)) ... Algorithm 8 general-fingerprinting-sketch (n d matrix A, ε (0, 1), k 0)
Open Source Code Yes All experiments were run locally on a M2 Mac Book Air. The code can be found here.
Open Datasets Yes We use two publicly-available datasets, the UC Irvine Adult and US Census Data (1990) (Becker & Kohavi, 1996; Meek et al.). Aeberhard, S. and Forina, M. Wine. UCI Machine Learning Repository, 1991. DOI: https://doi.org/10.24432/C5PC7J. Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. Meek, C., Thiesson, B., and Heckerman, D. US Census Data (1990). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5VP42.
Dataset Splits No The paper does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training, validation, or test sets).
Hardware Specification Yes All experiments were run locally on a M2 Mac Book Air.
Software Dependencies No The paper mentions the implementation of a baseline (
Experiment Setup Yes The main variable we vary is the size of our L0 sketch, specifically with 300, 600, 900, and 1, 250 rows. Then, we ran k-means with 3 clusters (for the 3 wine types) using just the selected features.