Deep Generalized Prediction Set Classifier and Its Theoretical Guarantees

Authors: Zhou Wang, Xingye Qiao

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our method outperforms the baselines on several benchmark datasets. ... 5 Experiments Baselines. Deep GPS is compared with baselines (GPS, BCOPS, and CDL) tailored to the task of set-valued classification with OOD detection. ... Datasets. We deploy all methods on CIFAR-10, MNIST, and Fashion-MNIST datasets. ... Metrics. We present the sample class-specific accuracy, the aligned OOD recall, and the aligned efficiency ... Results. The results of two SCOD methods are reported when the rate of incorrectly rejecting normal observations is around γ by thresholding their score functions.
Researcher Affiliation Academia Zhou Wang EMAIL Department of Mathematics and Statistics Binghamton University, the State University of New York Xingye Qiao EMAIL Department of Mathematics and Statistics Binghamton University, the State University of New York
Pseudocode No The paper describes the methodology in prose and includes figures illustrating concepts (e.g., Figure 1, Figure 2, Figure 3), but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper does not contain an explicit statement about releasing their source code, nor does it provide a link to a code repository for the methodology described.
Open Datasets Yes Datasets. We deploy all methods on CIFAR-10, MNIST, and Fashion-MNIST datasets.
Dataset Splits Yes We split each original dataset into three sets: labeled set containing only normal classes to mimic distribution P, unlabeled set mixing normal and OOD classes to mimic distribution Q, and the test set. The first two are used to train the model, and the test set is to evaluate the performance. ... The training set (labeled set and unlabeled set as mentioned in Section 5) is further split with a ratio 9:1 into training data for learning f and validation data for tuning parameters.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments.
Software Dependencies No We use the Adam optimizer (Kingma & Ba, 2014) with learning rate lr = 10 4 and (β1, β2) = (0.999, 0.999). ... Res Net18 (He et al., 2016) architecture is the same as the one in Py Torch and Le Net-type CNNs (Ruff et al., 2020). The paper mentions PyTorch and Adam optimizer but does not specify their version numbers.
Experiment Setup Yes Experiments run over 150 epochs with batch size 512. ... We use the Adam optimizer (Kingma & Ba, 2014) with learning rate lr = 10 4 and (β1, β2) = (0.999, 0.999). Additionally, we set the weight decay in Adam as 10 4 ... The values of γ are prescribed as 0.05, 0.01, and 0.01 for CIFAR-10, MNIST, and Fashion-MNIST, respectively ... The tuning parameter C is determined such that the prediction set is smallest on the unlabeled part in the validation data when the misclassification rate is close to γ on the labeled part in the validation data.