Partial-Label Learning with a Reject Option
Authors: Tobias Fuchs, Florian Kalinke, Klemens Böhm
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on artificial and real-world datasets show that our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors, which use confidence thresholds for rejecting unsure predictions. When evaluated without the reject option, our nearest-neighbor-based approach also achieves competitive prediction performance. |
| Researcher Affiliation | Academia | Tobias Fuchs EMAIL Karlsruhe Institute of Technology, Germany |
| Pseudocode | Yes | Algorithm 1 DST-PLL (Our proposed method) |
| Open Source Code | Yes | Experiments. Extensive experiments on artificial and real-world data support our claims. We make our code and data openly available.1 1https://github.com/mathefuchs/pll-with-a-reject-option. |
| Open Datasets | Yes | For the supervised datasets, we use the ecoli (Horton & Nakai, 1996), multiple-features (Duin, 2002), pen-digits (Alpaydin & Alimoglu, 1998), semeion (Buscema & Terzi, 2008), solar-flare (Dodson & Hedeman, 1989), statlog-landsat (Srinivasan, 1993), and theorem datasets (Bridge et al., 2013) from the UCI repository (Bache & Lichman, 2013). These datasets contain between 336 and 10 992 instances each. Also, we use the popular MNIST (Le Cun et al., 1999), KMNIST (Clanuwat et al., 2018), and FMNIST datasets (Xiao et al., 2018), which contain 60 000 images each similar to other datasets like Cifar-10 and Cifar-100 (Krizhevsky, 2009). For the partially labeled data, we use the bird-song (Briggs et al., 2012), flickr (Huiskes & Lew, 2008), yahoo-news (Guillaumin et al., 2010), and msrc-v2 datasets (Liu & Dietterich, 2012). |
| Dataset Splits | No | The paper states: "We repeat all experiments five times to report averages and standard deviations." and references a "default protocol" for data handling. However, it does not explicitly provide specific percentages for training/test/validation splits, sample counts, or a detailed methodology for how the data was partitioned for reproduction. |
| Hardware Specification | Yes | All experiments need two to three days on a machine with 48 cores and one NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper states: "We have implemented all approaches in Python using the Pytorch library." However, it does not provide specific version numbers for Python, Pytorch, or any other libraries. |
| Experiment Setup | Yes | As mentioned in Section 5.1, we consider ten commonly used PLL approaches. We choose their parameters as recommended by the respective authors. Pl-Knn (Hüllermeier & Beringer, 2005): For all non-MNIST datasets, we use k = 10 neighbors as recommended by the authors. For the MNIST datasets, we use the hidden representation of a variational auto-encoder as instance features and use k = 20. The variational auto-encoder has a 768-dimensional input layer (flat MNIST input), a 512-dimensional second layer, and 48-dimensional bottleneck layers for the mean and variance representations. The decoder uses a 48-dimensional first layer, a 512-dimensional second layer, and a 768-dimensional output layer with sigmoid activation. Otherwise, we use Re LU activations between all layers. Binary cross-entropy is used as a reconstruction loss. We choose the Adam W optimizer for training. Pl-Svm (Nguyen & Caruana, 2008): We use the Pegasos optimizer (Shalev-Shwartz et al., 2007) and λ = 1. Ipal (Zhang & Yu, 2015): We use k = 10 neighbors, α = 0.95, and 100 iterations. Pl-Ecoc (Zhang et al., 2017): We use L = 10 log2(l) and τ = 0.1 as recommended. Proden (Lv et al., 2020): For a fair comparison, we use the same base models for all neural-networkbased approaches. We use a standard d-300-300-300-l MLP (Werbos, 1974) for the non-MNIST datasets with Re LU activations, batch normalizations, and softmax output. For the MNIST datasets, we use the Le Net architecture (Le Cun et al., 1998). We choose the Adam optimizer for training. Cc (Feng et al., 2020): We use the same base models as mentioned above for Proden. Valen (Xu et al., 2021): We use the same base models as mentioned above for Proden. Pop (Xu et al., 2023): We use the same base models as mentioned above for Proden. Also, we set e0 = 0.001, eend = 0.04, and es = 0.001. We abstain from using the data augmentations discussed in the paper for a fair comparison. Cro Sel (Tian et al., 2024): We use the same base models as mentioned above for Proden. We use 10 warm-up epochs using Cc and λcr = 2. We abstain from using the data augmentations discussed in the paper for a fair comparison. Dst-Pll (our proposed approach): Similar to Pl-Knn and Ipal, we use k = 10 neighbors for the non-MNIST datasets. For the MNIST datasets, we use the hidden representation of a variational auto-encoder as instance features and use k = 20. The architecture of the variational auto-encoder is the same as described above for Pl-Knn. |