ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
Authors: Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Somesh Jha, Tomas Pfister
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop. |
| Researcher Affiliation | Collaboration | Jiefeng Chen EMAIL University of Wisconsin-Madison Jinsung Yoon EMAIL Google Sayna Ebrahimi EMAIL Google Sercan Ö. Arık EMAIL Google Somesh Jha EMAIL University of Wisconsin-Madison Google Tomas Pfister tpfister@google.com Google |
| Pseudocode | Yes | Algorithm 1 Softmax Response with Active Learning Algorithm 2 Deep Ensembles with Active Learning Algorithm 3 Active Selective Prediction using Ensembles and Self-Training |
| Open Source Code | Yes | Work done during internship at Google. Our code is available at: https://github.com/google-research/google-research/tree/master/active_selective_prediction. |
| Open Datasets | Yes | Specifically, we use the following datasets with distribution shift: (i) MNIST SVHN (Le Cun, 1998; Netzer et al., 2011), (ii) CIFAR-10 CINIC-10 (Krizhevsky et al., 2009; Darlow et al., 2018), (iii) FMo W (Koh et al., 2021), (iv) Amazon Review (Koh et al., 2021), (v) Domain Net (Peng et al., 2019) and (vi) Otto (Benjamin Bossan, 2015). |
| Dataset Splits | Yes | MNIST consists 28 28 grayscale images of handwritten digits, containing in total 5,500 training images and 1,000 test images. We resize each image to be 32 32 resolution and change them to be colored. We use the training set of MNIST as Dtr and the test set of MNIST as the source validation dataset. SVHN consists 32 32 colored images of digits obtained from house numbers in Google Street View images. The training set has 73,257 images and the test set has 26,032 images. We use the test set of SVHN as UX. |
| Hardware Specification | Yes | We run all experiments with Tensor Flow 2.0 on NVIDIA A100 GPUs in the Debian GNU/Linux 10 system. |
| Software Dependencies | Yes | We run all experiments with Tensor Flow 2.0 on NVIDIA A100 GPUs in the Debian GNU/Linux 10 system. |
| Experiment Setup | Yes | Active learning hyper-parameters. We evaluate different methods with different labeling budget M values on each dataset. By default, we set the number of rounds T = 10 for all methods (Appendix F.6 presents the effect of T). During the active learning process, we fine-tune the model on the selected labeled test data. During fine-tuning, we don t apply any data augmentation to the test data. We use the same finetuning hyper-parameters for different methods to ensure a fair comparison. More details on the fine-tuning hyper-parameters can be found in Appendix E.4. Hyper-parameters of ASPEST. Table 2 comprehensively lists all the hyperparameters used in ASPEST, along with their respective default values. We set λ = 1, ns = 1000 and N = 5 (see Appendix F.7 for the effect of N), which are the same as those for Deep Ensembles, for fair comparisons. For all datasets, we use cs = 200, p = 0.1, η = 0.9, the number of self-training epochs to be 20 and ce = 5. |