Realistic Evaluation of Deep Partial-Label Learning Algorithms

Authors: Wei Wang, Dong-Dong Wu, Jindong Wang, Gang Niu, Min-Ling Zhang, Masashi Sugiyama

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we delve into the empirical perspective of PLL and identify several critical but previously overlooked issues...Based on these findings, we propose PLENCH, the first Partial-Label learning b ENCHmark to systematically compare state-of-the-art deep PLL algorithms...We also create Partial-Label CIFAR-10 (PLCIFAR10), an image dataset of human-annotated partial labels collected from Amazon Mechanical Turk, to provide a testbed for evaluating the performance of PLL algorithms in more realistic scenarios. 6 EXPERIMENTS 6.1 EXPERIMENTAL RESULTS ON TABULAR DATASETS 6.2 EXPERIMENTAL RESULTS ON IMAGE DATASETS
Researcher Affiliation Academia 1 The University of Tokyo, Chiba, Japan 2 RIKEN, Tokyo, Japan 3 Southeast University, Nanjing, China 4 William & Mary, Williamsburg, VA, USA
Pseudocode Yes For example, to implement CAVL (Zhang et al., 2022), we can write the following code: class CAVL(Algorithm): def __init__(self, input_shape, train_given Y, hparams): super(CAVL, self).__init__(input_shape, train_given Y, hparams) self.featurizer = networks.Featurizer(input_shape, self.hparams) self.classifier = networks.Classifier( self.featurizer.n_outputs, self.num_classes) self.network = nn.Sequential(self.featurizer, self.classifier) self.optimizer = torch.optim.Adam( self.network.parameters(), lr=self.hparams["lr"], weight_decay=self.hparams[ weight_decay ] ) train_given Y = torch.from_numpy(train_given Y) temp Y = train_given Y.sum(dim=1).unsqueeze(1).repeat(1, train_given Y.shape[1]) label_confidence = train_given Y.float()/temp Y self.label_confidence = label_confidence self.label_confidence = self.label_confidence.double()
Open Source Code Yes 1The code implementation of PLENCH is available at https://github.com/wwangwitsel/ PLENCH. The PLCIFAR10 dataset is available at https://github.com/wwangwitsel/PLCIFAR10.
Open Datasets Yes 1The code implementation of PLENCH is available at https://github.com/wwangwitsel/ PLENCH. The PLCIFAR10 dataset is available at https://github.com/wwangwitsel/PLCIFAR10. ... Tabular datasets can be downloaded from https://palm.seu.edu.cn/zhangml/Resources. htm#data.
Dataset Splits Yes For tabular datasets, we first divided a test set from the entire dataset. Since the datasets were not explicitly divided into training and validation parts, we manually divided them into a partial-label training set DTr and a partial-label validation set DVal. ... We used three random data splits for PLCIFAR10 and five random data splits for tabular datasets.
Hardware Specification Yes All the algorithms were implemented in Py Torch (Paszke et al., 2019) and all experiments were conducted with a single NVIDIA Tesla V100 GPU.
Software Dependencies No All the algorithms were implemented in Py Torch (Paszke et al., 2019) and all experiments were conducted with a single NVIDIA Tesla V100 GPU. We used the Adam optimizer (Kingma & Ba, 2015).
Experiment Setup Yes We ran 60,000 iterations for the image datasets, 20,000 iterations for the Soccer Player, Italian, Yahoo! News, and English datasets, and 10,000 iterations for the other datasets. We recorded the performance on validation and test sets per 1,000 iterations. We used three random data splits for PLCIFAR10 and five random data splits for tabular datasets. For each data split, we selected 20 random hyperparameter configurations from a given pool. Table 8 shows the details of the hyperparameter configurations for all algorithms.