Selective Classification Under Distribution Shifts

Authors: Hengyue Liang, Le Peng, Ju Sun

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. The code is available at https://github.com/sun-umn/sc_with_distshift. Section 4: Experiments. In this section, we experiment with various multiclass classification tasks and recent DNN classifiers to verify the effectiveness of our margin-based score functions for generalized SC.
Researcher Affiliation Academia Hengyue Liang EMAIL Department of Electrical and Computer Engineering University of Minnesota; Le Peng EMAIL Department of Computer Science and Engineering University of Minnesota; Ju Sun EMAIL Department of Computer Science and Engineering University of Minnesota
Pseudocode Yes Algorithm 1 Non-training-based selective classification. Algorithm 2 Typical OOD detection pipeline (e.g., Sun et al. (2021))
Open Source Code Yes The code is available at https://github.com/sun-umn/sc_with_distshift.
Open Datasets Yes our evaluation tasks include (i) Image Net (Russakovsky et al., 2015), the most widely used testbed for image classification, with a covariate-shifted version Image Net-C (Hendrycks & Dietterich, 2018) composed of synthetic perturbations, and Open Image-O (Wang et al., 2022) composed of natural images similar to Image Net but with disjoint labels, i.e., label-shifted samples; (ii) i WIld Cam (Beery et al., 2020) test set provides two subsets of animal images taken at different geo-locations; (iii) Amazon (Ni et al., 2019) test set provides two subsets of review comments by different users; (iv) CIFAR-10 (Krizhevsky et al., 2009), a small image classification dataset commonly used in previous training-based SC works, together with CIFAR-10-C (perturbed CIFAR-10) and CIFAR-100 (with disjoint labels from CIFAR-10)
Dataset Splits Yes Table 2: Summary of In-D and distribution-shifted datasets used for our SC evaluation. Task: Image Net, In-D (split): ILSVRC-2012 (val), classes: 1000, samples: 50,000, Shift-Cov samples: Image Net-C (severity 3) 50,000, Shift-Label samples: Open Image-O 17,256. Task: i Wild Cam, In-D (split): i Wild Cam (id_test), classes: 178, samples: 8154, Shift-Cov samples: i Wild Cam (ood_test) 42791. Task: Amazon, In-D (split): Amazon (id_test), classes: 5, samples: 46,950, Shift-Cov samples: Amazon (test) 100,050. Task: CIFAR, In-D (split): CIFAR-10 (val), classes: 10, samples: 10,000, Shift-Cov samples: CIFAR-10-C (severity 3) 10,000, Shift-Label samples: CIFAR-100 10,000.
Hardware Specification No The authors acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for providing resources that contributed to the research results reported in this article.
Software Dependencies No The paper mentions 'timm' (Wightman, 2019) for model retrieval, and 'Py Torch' for reimplementing Sc Net. However, it does not specify exact version numbers for these software libraries, nor for Python, CUDA, or other dependencies.
Experiment Setup Yes Table 7: Key hyperparameters for the Sc Net training used in this paper. Dataset: CIFAR-10, Model architecture: VGG, Dropout prob.: 0.3, Target coverage: 0.7, Batch size: 128, Total epochs: 300, Lr (base): 0.1, Scheduler: Step LR. Dataset: Imaeg Net-1k, Model architecture: resnet34, Dropout prob.: N/A, Target coverage: 0.7, Batch size: 768, Total epochs: 250, Lr (base): 0.1, Scheduler: Cosine Annealing LR.