A Comparative Evaluation of Quantification Methods

Authors: Tobias Schumacher, Markus Strohmaier, Florian Lemmerich

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods on in total more than 40 datasets, considering binary as well as multiclass quantification settings. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods that perform best in the binary setting...
Researcher Affiliation Academia Tobias Schumacher EMAIL University of Mannheim, Germany RWTH Aachen University, Germany Markus Strohmaier EMAIL University of Mannheim, Germany GESIS Leibniz Institute for the Social Sciences, Germany Complexity Science Hub, Austria Florian Lemmerich EMAIL University of Passau, Germany
Pseudocode No The paper describes algorithms in prose within section "3. Algorithms for Quantification" but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation of the algorithms and experiments can be found on Git Hub1. 1. https://github.com/tobiasschumacher/quantification_paper
Open Datasets Yes We applied all algorithms on a broad range of 40 datasets collected from the UCI machine learning repository2 and from Kaggle3. An overview of these datasets, along with their characteristics and abbreviations that we use when describing our results, is given in Table 2. 2. https://archive.ics.uci.edu/ml/index.php 3. https://www.kaggle.com/datasets
Dataset Splits Yes Regarding training and test distributions, in the binary case, we considered different prevalences of training positives postrain and test positives postest in the respective sets postrain {0.05, 0.1, 0.3, 0.5, 0.7, 0.9} and postest {0, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, following the protocol introduced by Forman (2008). ... In both binary and multiclass settings, we considered splits with relative amounts of training versus test data samples in {(0.1, 0.9), (0.3, 0.7), (0.5, 0.5), (0.7, 0.3)}, thereby simulating scenarios in which we have little as well as relatively much data at hand to train our models.
Hardware Specification No The authors acknowledge support by the state of Baden-Württemberg through the bw HPC and the German Research Foundation (DFG) through grant INST 35/1597-1 FUGG. This mentions an HPC resource but does not provide specific hardware details such as CPU or GPU models.
Software Dependencies No Except for the SVMperf-based quantifiers and quantification forests, all algorithms were implemented from scratch in Python 3, using scikit-learn as base implementation for the underlying classifiers and the package cvxpy (Diamond and Boyd, 2016) to solve constrained optimization problems. The versions for `scikit-learn` and `cvxpy` are not specified.
Experiment Setup Yes In our main experiments, we chose the following hyperparameters for the quantifiers: As mentioned above, for all methods that use a classifier to perform quantification, we used the logistic regression classifier with the default L-BFGS solver along with its built-in probability estimator provided by scikit-learn and set the number of maximum iterations at 1000. We always used stratified 10-fold cross-validation on the training set when estimating the misclassification rates or computing the set of scores and thresholds that the quantifiers needed. ... For the Dy S framework, including the HDy method, we chose to divide its confidence scores into 10 bins...