reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Active Sequential Two-Sample Testing

Authors: Weizhi Li, Prad Kadambi, Pouria Saidi, Karthikeyan Natesan Ramamurthy, Gautam Dasarathy, Visar Berisha

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In practice, we introduce an instantiation of our framework and evaluate it using several experiments; the experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control.
Researcher Affiliation	Collaboration	Weizhi Li EMAIL Arizona State University Los Alamos National Laboratory Prad Kadambi EMAIL Arizona State University Pouria Saidi EMAIL Arizona State University Karthikeyan Natesan Ramamurthy EMAIL IBM Research Gautam Dasarathy EMAIL Arizona State University Visar Berisha EMAIL Arizona State University
Pseudocode	Yes	Algorithm 1 Bimodal Query Based Active Sequential Two-Sample Testing (BQ-AST)
Open Source Code	No	The paper does not provide an explicit statement about releasing code, nor does it include a link to a code repository for the methodology described.
Open Datasets	Yes	The experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control. We demonstrate the utility of the proposed test in a clinical application using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (Jack Jr et al., 2008).
Dataset Splits	No	The paper describes the sizes of unlabeled sets (e.g., "Each case of data is of size 2000 with labels masked, resulting in an unlabeled set Su with \|Su\| = 2000") and the number of initial labeled samples (N0 = 10) and total label budget (Nq), but it does not specify fixed training/validation/test splits in the traditional sense, as the experiment involves sequential active labeling.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or memory.
Software Dependencies	No	In this section, we compare the BQ-AST with a sequential testing baseline (Lhéritier & Cazals, 2018) that uses the same statistic in equation 2, but the baseline labels features randomly sampled from the unlabeled set Su. In addition, we build Q (z \| s) for the test statistic in equation 2 using logistic regression, SVM, or KNN classifiers; we set N0 = 10 for the number of label queries used to initialize Q (z \| s), and set significance level α = 0.05. The paper mentions classifiers but does not specify software package versions.
Experiment Setup	Yes	In addition, we build Q (z \| s) for the test statistic in equation 2 using logistic regression, SVM, or KNN classifiers; we set N0 = 10 for the number of label queries used to initialize Q (z \| s), and set significance level α = 0.05.