Selective Classification Under Distribution Shifts
Authors: Hengyue Liang, Le Peng, Ju Sun
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. The code is available at https://github.com/sun-umn/sc_with_distshift. Section 4: Experiments. In this section, we experiment with various multiclass classification tasks and recent DNN classifiers to verify the effectiveness of our margin-based score functions for generalized SC. |
| Researcher Affiliation | Academia | Hengyue Liang EMAIL Department of Electrical and Computer Engineering University of Minnesota; Le Peng EMAIL Department of Computer Science and Engineering University of Minnesota; Ju Sun EMAIL Department of Computer Science and Engineering University of Minnesota |
| Pseudocode | Yes | Algorithm 1 Non-training-based selective classification. Algorithm 2 Typical OOD detection pipeline (e.g., Sun et al. (2021)) |
| Open Source Code | Yes | The code is available at https://github.com/sun-umn/sc_with_distshift. |
| Open Datasets | Yes | our evaluation tasks include (i) Image Net (Russakovsky et al., 2015), the most widely used testbed for image classification, with a covariate-shifted version Image Net-C (Hendrycks & Dietterich, 2018) composed of synthetic perturbations, and Open Image-O (Wang et al., 2022) composed of natural images similar to Image Net but with disjoint labels, i.e., label-shifted samples; (ii) i WIld Cam (Beery et al., 2020) test set provides two subsets of animal images taken at different geo-locations; (iii) Amazon (Ni et al., 2019) test set provides two subsets of review comments by different users; (iv) CIFAR-10 (Krizhevsky et al., 2009), a small image classification dataset commonly used in previous training-based SC works, together with CIFAR-10-C (perturbed CIFAR-10) and CIFAR-100 (with disjoint labels from CIFAR-10) |
| Dataset Splits | Yes | Table 2: Summary of In-D and distribution-shifted datasets used for our SC evaluation. Task: Image Net, In-D (split): ILSVRC-2012 (val), classes: 1000, samples: 50,000, Shift-Cov samples: Image Net-C (severity 3) 50,000, Shift-Label samples: Open Image-O 17,256. Task: i Wild Cam, In-D (split): i Wild Cam (id_test), classes: 178, samples: 8154, Shift-Cov samples: i Wild Cam (ood_test) 42791. Task: Amazon, In-D (split): Amazon (id_test), classes: 5, samples: 46,950, Shift-Cov samples: Amazon (test) 100,050. Task: CIFAR, In-D (split): CIFAR-10 (val), classes: 10, samples: 10,000, Shift-Cov samples: CIFAR-10-C (severity 3) 10,000, Shift-Label samples: CIFAR-100 10,000. |
| Hardware Specification | No | The authors acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for providing resources that contributed to the research results reported in this article. |
| Software Dependencies | No | The paper mentions 'timm' (Wightman, 2019) for model retrieval, and 'Py Torch' for reimplementing Sc Net. However, it does not specify exact version numbers for these software libraries, nor for Python, CUDA, or other dependencies. |
| Experiment Setup | Yes | Table 7: Key hyperparameters for the Sc Net training used in this paper. Dataset: CIFAR-10, Model architecture: VGG, Dropout prob.: 0.3, Target coverage: 0.7, Batch size: 128, Total epochs: 300, Lr (base): 0.1, Scheduler: Step LR. Dataset: Imaeg Net-1k, Model architecture: resnet34, Dropout prob.: N/A, Target coverage: 0.7, Batch size: 768, Total epochs: 250, Lr (base): 0.1, Scheduler: Cosine Annealing LR. |