reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

Authors: Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also perform an experimental case study on a label shift dataset and ﬁnd that in line with our theory, the test accuracy of robust neural network classiﬁers is constrained by the number of minority samples. (...) Figure 2: Convolutional neural network classiﬁers trained on the Imbalanced Binary CIFAR10 dataset with a 5:1 label imbalance.
Researcher Affiliation	Academia	Niladri S. Chatterji EMAIL Department of Computer Science Stanford University Saminul Haque EMAIL Department of Computer Science Stanford University Tatsunori B. Hashimoto EMAIL Department of Computer Science Stanford University
Pseudocode	Yes	Undersampled binning estimator The undersampled binning estimator AUSB takes as input a dataset S and a positive integer K corresponding to the number of bins, and returns a classiﬁer AS,K USB : [0, 1] { 1, 1}. This estimator is deﬁned as follows: 1. First, we compute the undersampled dataset SUS. 2. Given this dataset SUS, let n1,j be the number of points with label +1 that lie in the interval Ij = [ j 1 K ]. Also, deﬁne n 1,j analogously. Then set ( 1 if n1,j > n 1,j, 1 otherwise. 3. Deﬁne the classiﬁer AS,K USB such that if x Ij then AS,K USB(x) = Aj. (5).
Open Source Code	No	The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets	Yes	To explore this question, we conduct a small case study using the imbalanced binary CIFAR10 dataset (Byrd & Lipton, 2019; Wang et al., 2022) that is constructed using the cat and dog classes.
Dataset Splits	Yes	The test set consists of all of the 1000 cat and 1000 dog test examples. To form our initial train and validation sets, we take 2500 cat examples but only 500 dog examples from the oﬃcial train set, corresponding to a 5:1 label imbalance. We then use 80% of those examples for training and the rest for validation.
Hardware Specification	No	We note that all of the experiments were performed on an internal cluster on 8 GPUs. This statement is too general and does not provide specific hardware models or specifications.
Software Dependencies	No	The paper mentions using SGD and various loss functions (importance weighted cross entropy loss, importance weighted VS loss, tilted loss, group-DRO) but does not specify any software libraries or frameworks with their version numbers.
Experiment Setup	Yes	We train this model using SGD for 800 epochs with batchsize 64, a constant learning rate 0.001 and momentum 0.9. The importance weights used upweight the minority class samples in the training loss and validation loss is calculated to be #Cat Train Examples / #Dog Train Examples. We set τ = 3 and γ = 0.3, the best hyperparameters identiﬁed by Wang et al. (2022) on this dataset for this neural network architecture.