Active Sequential Two-Sample Testing

Authors: Weizhi Li, Prad Kadambi, Pouria Saidi, Karthikeyan Natesan Ramamurthy, Gautam Dasarathy, Visar Berisha

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In practice, we introduce an instantiation of our framework and evaluate it using several experiments; the experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control.
Researcher Affiliation Collaboration Weizhi Li EMAIL Arizona State University Los Alamos National Laboratory Prad Kadambi EMAIL Arizona State University Pouria Saidi EMAIL Arizona State University Karthikeyan Natesan Ramamurthy EMAIL IBM Research Gautam Dasarathy EMAIL Arizona State University Visar Berisha EMAIL Arizona State University
Pseudocode Yes Algorithm 1 Bimodal Query Based Active Sequential Two-Sample Testing (BQ-AST)
Open Source Code No The paper does not provide an explicit statement about releasing code, nor does it include a link to a code repository for the methodology described.
Open Datasets Yes The experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control. We demonstrate the utility of the proposed test in a clinical application using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (Jack Jr et al., 2008).
Dataset Splits No The paper describes the sizes of unlabeled sets (e.g., "Each case of data is of size 2000 with labels masked, resulting in an unlabeled set Su with |Su| = 2000") and the number of initial labeled samples (N0 = 10) and total label budget (Nq), but it does not specify fixed training/validation/test splits in the traditional sense, as the experiment involves sequential active labeling.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or memory.
Software Dependencies No In this section, we compare the BQ-AST with a sequential testing baseline (Lhéritier & Cazals, 2018) that uses the same statistic in equation 2, but the baseline labels features randomly sampled from the unlabeled set Su. In addition, we build Q (z | s) for the test statistic in equation 2 using logistic regression, SVM, or KNN classifiers; we set N0 = 10 for the number of label queries used to initialize Q (z | s), and set significance level α = 0.05. The paper mentions classifiers but does not specify software package versions.
Experiment Setup Yes In addition, we build Q (z | s) for the test statistic in equation 2 using logistic regression, SVM, or KNN classifiers; we set N0 = 10 for the number of label queries used to initialize Q (z | s), and set significance level α = 0.05.