reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries

Authors: Edith Cohen, Mihir Singhal, Uri Stemmer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our fine-grained approach by comparing the number of queries that can be guaranteed compared with the baseline per-query analysis. We use synthetically generated query sets sampled from Uniform and Pareto distributions with α {1.5, 2} and xm = 1, support size of 10^6 and query set size of 5 * 10^3. For each sketch size k, we match a value of the parameter r = 0.002k^2 (with Robust Est and TRobust Est) and respectively t = 0.002k^2 with baseline analysis. Figure 1 reports the number of queries with the baseline and fine-grained estimators. The respective gain factor is measured by the ratio of the number of queries that can be effectively answered with per-key analysis to the baseline. We observe gains of two orders of magnitude for uniformly sampled query sets. This hold even without tracking using Robust Est. For Pareto query sets, we only TRobust Est, as tracking is necessary because some keys appear in many query sketches. We observe gains of 12 for the very skewed α = 1.5 and 40 with α = 2.
Researcher Affiliation	Collaboration	1Google Research 2School of Computer Science, Tel Aviv University, Israel 3School of Computer Science, UC Berkeley, Berkeley, CA, USA. Correspondence to: Edith Cohen <EMAIL>, Mihir Singhal <EMAIL>, Uri Stemmer <EMAIL>.
Pseudocode	Yes	Algorithm 1: Linear Queries with Individual Privacy Charging Algorithm 2: Bottom-k Cardinality Sketch and Standard Estimator Algorithm 3: Basic Robust Cardinality Estimator Algorithm 4: Tracking Robust Estimator
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets	No	The paper mentions "synthetically generated query sets sampled from Uniform and Pareto distributions". This describes the method of data generation and the types of distributions used, but does not refer to a specific, publicly accessible dataset nor provides any access information (link, DOI, repository, or citation) for a dataset.
Dataset Splits	No	The paper mentions "synthetically generated query sets sampled from Uniform and Pareto distributions with α {1.5, 2} and xm = 1, support size of 10^6 and query set size of 5 * 10^3". This describes characteristics of the generated data, but it does not specify any training, validation, or test splits for this data, nor does it refer to any standard dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running its experiments, such as GPU models, CPU models, or other computing resources.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	We use synthetically generated query sets sampled from Uniform and Pareto distributions with α {1.5, 2} and xm = 1, support size of 10^6 and query set size of 5 * 10^3. For each sketch size k, we match a value of the parameter r = 0.002k^2 (with Robust Est and TRobust Est) and respectively t = 0.002k^2 with baseline analysis.