Learning to Abstain From Uninformative Data

Authors: Yikai Zhang, Songzhu Zheng, Mina Dalirrooyfard, Pengxiang Wu, Anderson Schneider, Anant Raj, Yuriy Nevmyvaka, Chao Chen

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We build upon the strength of our theoretical guarantees by describing an iterative algorithm, which jointly optimizes both a predictor and a selector, and evaluates its empirical performance in a variety of settings. ... In this section, we test the efficacy of our practical algorithm (Algorithm 1) on both publicly-available and semi-synthetic datasets.
Researcher Affiliation Collaboration Yikai Zhang EMAIL Morgan Stanley Songzhu Zheng EMAIL Morgan Stanley Mina Dalirrooyfard EMAIL Morgan Stanley Pengxiang Wu EMAIL Snap Inc. Anderson Schneider EMAIL Morgan Stanley Anant Raj EMAIL Morgan Stanley Yuriy Nevmyvaka EMAIL Morgan Stanley Chao Chen EMAIL Stony Brook University
Pseudocode Yes Algorithm 1 Iterative Soft Abstain (ISA) 1: Input: Dataset Sn = {(x1, y1), ..., (xn, yn))}, weight parameter:β, random initial classifier ˆf 0 and selector ˆg0, number of iterations T 2: for t 1, , T do 3: Optimize loss to update predictor ˆf t : 1 n Pn i=1 ˆgt(xi){yi log( ˆf t(xi)) + (1 yi) log(1 ˆf t(xi))} 4: Approximate the pseudo-informative label : zt i = 1{1{ ˆf t(xi) > 1 2} = yi} 5: Optimize loss to update selector gt : Pn i=1 {zt i log(ˆgt(xi)) + β(1 zt i) log(1 ˆgt(xi))} 6: end for 7: Output: ˆf T , ˆg T
Open Source Code Yes The code for reproducing the results could be found in https://github. com/morganstanley/MSML/tree/main/paper/Learn_to_Abstain.
Open Datasets Yes We test the efficacy of our practical algorithm (Algorithm 1) on both publicly-available and semi-synthetic datasets. ... For MNIST+Fashion-MNIST dataset, images from MNIST are defined to be uninformative, while images from Fashion-MNIST are set to be informative. For SVHN(Netzer et al., 2011) dataset, class 5-9 are set to be uninformative and class 0-4 are set to be informative. ... In this section, we report our empirical study on 3 publicly-available datasets: (1) Oxford realized volatility (Volatility) dataset (Heber et al., 2009), (2) breast ultrasound images (BUS) (Al-Dhabyani et al., 2020), and (3) lending club dataset (LC) (Lending Club, 2007). The data is retrieved from Kaggle, https://www.kaggle.com/datasets/ wordsforthewise/lending-club.
Dataset Splits Yes The data description and respective train-test splits are presented in Table 11 in section D.1. ... Table 11: Real-world Dataset Description ... Volatility Time Series 2 2 143784 9525 ... BUS Image 3 324 324 3 624 156 ... LC Tabular 1805 3 1058093 264524
Hardware Specification Yes We execute our program on Red Hat Enterprise Linux Server 7.9 (Maipo) and use NVIDIA V100 GPU with cuda version 12.1.
Software Dependencies Yes We conduct all our experiments using Pytorch 3.10 (Paszke et al., 2019). We execute our program on Red Hat Enterprise Linux Server 7.9 (Maipo) and use NVIDIA V100 GPU with cuda version 12.1.
Experiment Setup Yes For all of our synthetic experiments, we use Adam optimizer with learning rate 1e-3 and weight decay rate 1e-4. We use batch size 256 and train 60 epochs for MNIST+Fashion and 120 epochs for SVHN. The learning rate is reduced by 0.5 at epochs 15, 35 and 55 for MNIST+Fashion and is reduced by 0.5 at epochs 40, 60 and 80 for SVHN. ... For Volatility and LC, we use the same Adam optimizer with learning rate (1e-3), weight decay rate (1e-4) and batch size 256. For BUS, due to its limited sample size, we use smaller batch size (16) and reduce learning rate (1e-4) accordingly. For all three datasets, we train each algorithm for 50 epochs and reduce the learning rate by half at epochs 15 and 35.