Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning
Authors: Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess the effectiveness of our proposed method, we conducted extensive empirical experiments using deep neural networks on benchmark datasets. The results consistently showcase the superior performance of COPS compared to baseline methods, reaffirming its efficacy1. Keywords: Subset Selection, Uncertainty Estimation, Model Misspecification |
| Researcher Affiliation | Academia | Yong Lin * EMAIL Department of Computer Science Hong Kong University of Science and Technology Hong Kong, China Chen Liu * EMAIL Department of Mathematics Hong Kong University of Science and Technology Hong Kong, China Chenlu Ye * EMAIL Siebel School of Computing and Data Science University of Illinois Urbana-Champaign Illinois, USA Qing Lian EMAIL Department of Computer Science Hong Kong University of Science and Technology Hong Kong, China Yuan Yao EMAIL Department of Mathematics Hong Kong University of Science and Technology Hong Kong, China Tong Zhang EMAIL Siebel School of Computing and Data Science University of Illinois Urbana-Champaign Illinois, USA |
| Pseudocode | Yes | Algorithm 1: Uncertainty estimation in linear softmax regression. Algorithm 2: COPS for sampling with labels on linear models Algorithm 3: COPS for sampling without labels on linear models Algorithm 4: COPS for sampling with labels on DNNs Algorithm 5: COPS for sampling without labels on DNNs Algorithm 6: Uncertainty estimation for DNNs. Algorithm 7: COPS with uncertainty clipping for sampling with labels on DNNs Algorithm 8: COPS with uncertainty clipping for sampling without labels on DNNs Algorithm 9: COPS with full details for sampling with labels on DNNs Algorithm 10: COPS with full details for sampling without labels on DNNs |
| Open Source Code | Yes | 1. Our code can be find on https://github.com/corwinliu9669/COPS. |
| Open Datasets | Yes | CIFAR10 Krizhevsky et al. (2009): We utilize the original CIFAR10 dataset Krizhevsky et al. (2009). CIFAR10-N: We use CIFAR10-N, a corrupted version of CIFAR10 introduced by Wei et al. Wei et al. (2021). CIFAR100: From the CIFAR100 dataset Krizhevsky et al. (2009), we randomly select 200 samples for each class. IMDB: The IMDB dataset Maas et al. (2011) consists of positive and negative movie comments. SVHN: The SVHN dataset contains images of house numbers. Place365 (subset): We select ten classes from the Place365 dataset Zhou et al. (2017). |
| Dataset Splits | Yes | For all settings, we split the training set into two subsets, i.e., the probe set (S in Algorithm 4-5) and the sampling dataset set (S in Algorithm 4-5). We train 10 probe neural networks on S and estimate the uncertainty of each sample in S with these networks. By sampling with replacement, we select an subset with 300 samples per class from S according to Algorithm 4-5, on which we train the a Res Net20 from scratch. CIFAR10 Krizhevsky et al. (2009): ... To construct the probe set, we randomly select 1000 samples from each class, while the remaining training samples are used for the sampling set. IMDB: ... We split 5000 samples from the training set for uncertainty estimation and conduct our scheme on the remaining 20000 samples. Table 3: The table provides descriptions of the datasets used in our study. The Probe Set/ Sampling Set column indicates the number of samples included in the Probe Set and Sampling Set for each dataset. The Target Size of Sub-sampling column represents the number of samples selected from the Sampling Set for sub-sampling. |
| Hardware Specification | No | The paper mentions 'GPU hours' in Table 7 but does not specify the type or model of GPU, CPU, or any other specific hardware component used for the experiments. |
| Software Dependencies | No | The paper mentions 'Adam W Optimizer Loshchilov and Hutter (2019)' and 'SGD' as optimizers and 'cosine lr decay' as a learning rate schedule, but it does not specify version numbers for these or any other key software libraries, frameworks (like PyTorch or TensorFlow), or programming languages. |
| Experiment Setup | Yes | We use Adam W Optimizer Loshchilov and Hutter (2019) with cosine lr decay for 150 epochs, the batch size is 256. We put a limit on the maximum weight when solving Eqn (9) to avoid large variance. Specifically, let ui denote the uncertainty of ith sample. In Eqn (9), we use 1 ui to reweight the selected data. To avoid large variance, we use 1 max{β,ui} as the reweighting to replace 1 ui . We simply set β = 0.1 for all experiments following Citovsky et al. (2023). Table 4: This table illustrates the training details. Here we set weight decay as 5e-4 for all the experiments. Here no schedule means using the start learning rate without modification during training. Schedule 1 stands for the decaying of the learning rate by 0.1 every 30 epochs. Schedule 2 means using the cosine learning schedule with Tmax = 50 and etamin = 0 |