Testing with Non-identically Distributed Samples

Authors: Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We examine the extent to which sublinear-sample property testing and estimation apply to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size k, p1, p2, . . . , p T , and we obtain c independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, pavg. ... Our first main result is that with just c = 2 samples from each distribution, we recover the full strength of sublinear sample uniformity (and identity) testing. ... The main part of the proof constitutes bounding the variance of the estimator that we construct in order to show that it concentrates around its mean.
Researcher Affiliation Collaboration Shivam Garg EMAIL Microsoft Research Chirag Pabbaraju EMAIL Stanford University Kirankumar Shiragur EMAIL Microsoft Research Gregory Valiant EMAIL Stanford University
Pseudocode No The paper describes algorithms for uniformity testing and identity testing using mathematical formulas and prose, such as in Section 3 'Uniformity testing from non-identical samples' and Section 4 'Identity testing from non-identical samples', but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to code repositories or supplementary materials containing code.
Open Datasets No The paper is theoretical and analyzes properties of distributions. It does not use or refer to any specific datasets, public or otherwise, for empirical evaluation.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no mentions of training, testing, or validation dataset splits.
Hardware Specification No The paper is theoretical and focuses on mathematical proofs and algorithms for property testing. It does not describe any computational experiments or the hardware used to perform them.
Software Dependencies No The paper is theoretical and does not describe any computational implementations or experiments. Consequently, no specific software dependencies or their version numbers are mentioned.
Experiment Setup No The paper is purely theoretical, presenting mathematical analysis, proofs, and algorithm design for property testing. It does not include details on experimental setup, hyperparameters, or training configurations.