Neyman-Pearson classification: parametrics and sample size requirement

Authors: Xin Tong, Lucy Xia, Jiacheng Wang, Yang Feng

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive simulations and real data analysis, we study the performance of NP-LDA, NP-s LDA, p NP-LDA and p NP-s LDA. In addition, we will study the new adaptive sample splitting scheme. In this section, N0 denotes the total class 0 training sample size (we do not use n0 and n 0 here, as class 0 observations are not assumed to be pre-divided into two parts), n1 denotes the class 1 training sample size, and N = N0 + n1 denotes the total sample size.
Researcher Affiliation Academia Xin Tong EMAIL Department of Data Sciences and Operations Marshall Business School University of Southern California Lucy Xia EMAIL Department of ISOM School of Business and Management Hong Kong University of Science and Technology Jiacheng Wang EMAIL Department of Statistics University of Chicago Yang Feng EMAIL Department of Biostatistics School of Global Public Health New York University
Pseudocode No The paper describes algorithms and procedures in descriptive text, such as the 'NP umbrella algorithm' or the 'adaptive sample splitting scheme', but does not present them in a structured pseudocode block or a clearly labeled algorithm format.
Open Source Code Yes The proposed NP classifiers are implemented in the R package nproc.
Open Datasets Yes For instance, a commonly-used lung cancer diagnosis example (Gordon et al., 2002) in the high-dimensional statistics literature... The first is a neuroblastoma dataset containing d = 43, 827 gene expression measurements from N = 498 neuroblastoma samples generated by the Sequencing Quality Control (SEQC) consortium (Wang et al., 2014). ...The second dataset is a high-dimensional breast cancer dataset (d = 22, 215, N = 118) (Chin et al., 2006)
Dataset Splits Yes We randomly split the dataset 1, 000 times into a training set (70%) and a test set (30%), and then train the NP classifiers on each training data and compute their empirical type I and type II errors over the corresponding test data. ...We randomly split the dataset 1,000 times into a training set (2/3) and a test set (1/3), train the two methods on the training set, and compute the empirical type I and type II errors on the corresponding test set.
Hardware Specification Yes All numerical experiments were performed on HP Enterprise XL170r with CPU E5-2650v4 (2.20 GHz) and 16 GB memory.
Software Dependencies No The paper mentions that the proposed NP classifiers are implemented in the 'R package nproc', but it does not specify the version of R or any other software dependencies with version numbers.
Experiment Setup Yes In all NP methods, τ, the class 0 split proportion, is fixed at 0.5. In every simulation setting, the experiments are repeated 1, 000 times. ... We set π0 = π1 = 0.5 and α = δ0 = 0.1. ... We set α = 0.2 and δ0 = 0.1. ... We set α = δ0 = 0.1, and train an enormous number of NP classifiers.