reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Testing with Non-identically Distributed Samples

Authors: Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We examine the extent to which sublinear-sample property testing and estimation apply to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size k, p1, p2, . . . , p T , and we obtain c independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, pavg. ... Our first main result is that with just c = 2 samples from each distribution, we recover the full strength of sublinear sample uniformity (and identity) testing. ... The main part of the proof constitutes bounding the variance of the estimator that we construct in order to show that it concentrates around its mean.
Researcher Affiliation	Collaboration	Shivam Garg EMAIL Microsoft Research Chirag Pabbaraju EMAIL Stanford University Kirankumar Shiragur EMAIL Microsoft Research Gregory Valiant EMAIL Stanford University
Pseudocode	No	The paper describes algorithms for uniformity testing and identity testing using mathematical formulas and prose, such as in Section 3 'Uniformity testing from non-identical samples' and Section 4 'Identity testing from non-identical samples', but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to code repositories or supplementary materials containing code.
Open Datasets	No	The paper is theoretical and analyzes properties of distributions. It does not use or refer to any specific datasets, public or otherwise, for empirical evaluation.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no mentions of training, testing, or validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on mathematical proofs and algorithms for property testing. It does not describe any computational experiments or the hardware used to perform them.
Software Dependencies	No	The paper is theoretical and does not describe any computational implementations or experiments. Consequently, no specific software dependencies or their version numbers are mentioned.
Experiment Setup	No	The paper is purely theoretical, presenting mathematical analysis, proofs, and algorithm design for property testing. It does not include details on experimental setup, hyperparameters, or training configurations.