Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
Authors: Alina Ene, Alessandro Epasto, Vahab Mirrokni, Hoai-An Nguyen, Huy Nguyen, David Woodruff, Peilin Zhong
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work. Experimental Results. We illustrate the practicality of our fingerprinting algorithms by running experiments on two different datasets. |
| Researcher Affiliation | Collaboration | Alina Ene 1 Alessandro Epasto 2 Vahab Mirrokni 2 Hoai-An Nguyen 3 Huy L. Nguyen 4 David P. Woodruff 2 3 Peilin Zhong 2. 1Boston University 2Google Research 3Carnegie Mellon University 4Northeastern University. |
| Pseudocode | Yes | Algorithm 1 building-A (n d matrix A, ε (0, 1), k) ... Algorithm 2 max-coverage ... Algorithm 3 A (k, ε, δ) ... Algorithm 4 k-cover ... Algorithm 5 Max-Coverage-LS (n d matrix A, ε (0, 1), k) ... Algorithm 6 sketchy-submodular-maximization ... Algorithm 7 p-Tuples-Sketch (n 1 vector x, constant integer p 2, γ, δ (0, 1)) ... Algorithm 8 general-fingerprinting-sketch (n d matrix A, ε (0, 1), k 0) |
| Open Source Code | Yes | All experiments were run locally on a M2 Mac Book Air. The code can be found here. |
| Open Datasets | Yes | We use two publicly-available datasets, the UC Irvine Adult and US Census Data (1990) (Becker & Kohavi, 1996; Meek et al.). Aeberhard, S. and Forina, M. Wine. UCI Machine Learning Repository, 1991. DOI: https://doi.org/10.24432/C5PC7J. Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. Meek, C., Thiesson, B., and Heckerman, D. US Census Data (1990). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5VP42. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training, validation, or test sets). |
| Hardware Specification | Yes | All experiments were run locally on a M2 Mac Book Air. |
| Software Dependencies | No | The paper mentions the implementation of a baseline ( |
| Experiment Setup | Yes | The main variable we vary is the size of our L0 sketch, specifically with 300, 600, 900, and 1, 250 rows. Then, we ran k-means with 3 clusters (for the 3 wine types) using just the selected features. |