reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Realistic Evaluation of Deep Partial-Label Learning Algorithms

Authors: Wei Wang, Dong-Dong Wu, Jindong Wang, Gang Niu, Min-Ling Zhang, Masashi Sugiyama

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we delve into the empirical perspective of PLL and identify several critical but previously overlooked issues...Based on these findings, we propose PLENCH, the first Partial-Label learning b ENCHmark to systematically compare state-of-the-art deep PLL algorithms...We also create Partial-Label CIFAR-10 (PLCIFAR10), an image dataset of human-annotated partial labels collected from Amazon Mechanical Turk, to provide a testbed for evaluating the performance of PLL algorithms in more realistic scenarios. 6 EXPERIMENTS 6.1 EXPERIMENTAL RESULTS ON TABULAR DATASETS 6.2 EXPERIMENTAL RESULTS ON IMAGE DATASETS
Researcher Affiliation	Academia	1 The University of Tokyo, Chiba, Japan 2 RIKEN, Tokyo, Japan 3 Southeast University, Nanjing, China 4 William & Mary, Williamsburg, VA, USA
Pseudocode	Yes	For example, to implement CAVL (Zhang et al., 2022), we can write the following code: class CAVL(Algorithm): def __init__(self, input_shape, train_given Y, hparams): super(CAVL, self).__init__(input_shape, train_given Y, hparams) self.featurizer = networks.Featurizer(input_shape, self.hparams) self.classifier = networks.Classifier( self.featurizer.n_outputs, self.num_classes) self.network = nn.Sequential(self.featurizer, self.classifier) self.optimizer = torch.optim.Adam( self.network.parameters(), lr=self.hparams["lr"], weight_decay=self.hparams[ weight_decay ] ) train_given Y = torch.from_numpy(train_given Y) temp Y = train_given Y.sum(dim=1).unsqueeze(1).repeat(1, train_given Y.shape[1]) label_confidence = train_given Y.float()/temp Y self.label_confidence = label_confidence self.label_confidence = self.label_confidence.double()
Open Source Code	Yes	1The code implementation of PLENCH is available at https://github.com/wwangwitsel/ PLENCH. The PLCIFAR10 dataset is available at https://github.com/wwangwitsel/PLCIFAR10.
Open Datasets	Yes	1The code implementation of PLENCH is available at https://github.com/wwangwitsel/ PLENCH. The PLCIFAR10 dataset is available at https://github.com/wwangwitsel/PLCIFAR10. ... Tabular datasets can be downloaded from https://palm.seu.edu.cn/zhangml/Resources. htm#data.
Dataset Splits	Yes	For tabular datasets, we first divided a test set from the entire dataset. Since the datasets were not explicitly divided into training and validation parts, we manually divided them into a partial-label training set DTr and a partial-label validation set DVal. ... We used three random data splits for PLCIFAR10 and five random data splits for tabular datasets.
Hardware Specification	Yes	All the algorithms were implemented in Py Torch (Paszke et al., 2019) and all experiments were conducted with a single NVIDIA Tesla V100 GPU.
Software Dependencies	No	All the algorithms were implemented in Py Torch (Paszke et al., 2019) and all experiments were conducted with a single NVIDIA Tesla V100 GPU. We used the Adam optimizer (Kingma & Ba, 2015).
Experiment Setup	Yes	We ran 60,000 iterations for the image datasets, 20,000 iterations for the Soccer Player, Italian, Yahoo! News, and English datasets, and 10,000 iterations for the other datasets. We recorded the performance on validation and test sets per 1,000 iterations. We used three random data splits for PLCIFAR10 and five random data splits for tabular datasets. For each data split, we selected 20 random hyperparameter configurations from a given pool. Table 8 shows the details of the hyperparameter configurations for all algorithms.