Unbiased Loss Functions for Multilabel Classification with Missing Labels
Authors: Erik Schultheis, Rohit Babbar
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The theoretical considerations are further supplemented by an experimental study showing that the switch to unbiased estimators significantly alters the bias-variance trade-off and may thus require stronger regularization. In order to judge the severity of the variance and overfitting problems in practice, we conducted three experiments. First, in a pure evaluation setting on synthetic, we calculated the unbiased recall@k for a varying fraction of missing labels, which shows that once this fraction becomes too large, the variance of the estimate explodes and it becomes unusable. The second experiment serves as a demonstration for the change in bias-variance trade-off as a result from switching to unbiased estimates. Finally, we repeat that experiment using real data from the Yahoo-Music R3 dataset... 7 Evaluation Experiments 8 Training Experiments |
| Researcher Affiliation | Academia | Erik Schultheis EMAIL Aalto University Espoo, Finland Rohit Babbar EMAIL University of Bath & Aalto University Bath, UK & Espoo, Finland |
| Pseudocode | No | The paper describes mathematical derivations and experimental procedures but does not include any explicitly labeled pseudocode or algorithm blocks. The methods are described narratively and through equations. |
| Open Source Code | Yes | The code for the experiments is provided at https://github.com/ xmc-aalto/missing-labels-tmlr. |
| Open Datasets | Yes | Finally, we repeat that experiment using real data from the Yahoo-Music R3 dataset, which has been sampled in such a way that the proportion of missing labels can be stimated reliably. ... We took the Amazon Cat-13k data and consider only the 100 most common labels... The networks used to generate the results in Table 3 were trained using the Di SMEC (Babbar and Schölkopf, 2017) algorithm, with the loss function being either the squared-hinge-loss (VN) or a squared-hinge-loss based convex surrogate of the unbiased estimate of the 0-1 loss as described in Qaraei et al. (2021) The datasets have been taken from the Extreme Classification Repository (Bhatia et al., 2016), and preprocessed by doing a tf-idf transformation. (Mentioning Bibtex, Mediamill, RCV1-2K, Amazon Cat-13K, Amazon Cat-14K, Eurlex-4K, Amazon-670K, Wiki LSHTC-325K in tables and text). |
| Dataset Splits | Yes | We took 30% of the original training data and used them as validation data to determine the optimal value for the strength of L2-regularization. ... the test set is based on explicitly querying the user on 10 out of the 1000 possible songs chosen uniformly at random. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. It only mentions general terms like 'training' or 'network'. |
| Software Dependencies | No | The paper mentions that the network is optimized using Adam (Kingma and Ba, 2017), but it does not specify any version numbers for Adam or any other software libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | On this data, we train a linear classifier with L2-regularization using different basis loss functions... The network is optimized using Adam (Kingma and Ba, 2017) with an initial learning rate of 10 4 for the first 15 epochs and 10 5 for the remaining five epochs, with a mini-batch size of 512. We took 30% of the original training data and used them as validation data to determine the optimal value for the strength of L2-regularization. |