A Comparative Evaluation of Quantification Methods
Authors: Tobias Schumacher, Markus Strohmaier, Florian Lemmerich
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods on in total more than 40 datasets, considering binary as well as multiclass quantification settings. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods that perform best in the binary setting... |
| Researcher Affiliation | Academia | Tobias Schumacher EMAIL University of Mannheim, Germany RWTH Aachen University, Germany Markus Strohmaier EMAIL University of Mannheim, Germany GESIS Leibniz Institute for the Social Sciences, Germany Complexity Science Hub, Austria Florian Lemmerich EMAIL University of Passau, Germany |
| Pseudocode | No | The paper describes algorithms in prose within section "3. Algorithms for Quantification" but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation of the algorithms and experiments can be found on Git Hub1. 1. https://github.com/tobiasschumacher/quantification_paper |
| Open Datasets | Yes | We applied all algorithms on a broad range of 40 datasets collected from the UCI machine learning repository2 and from Kaggle3. An overview of these datasets, along with their characteristics and abbreviations that we use when describing our results, is given in Table 2. 2. https://archive.ics.uci.edu/ml/index.php 3. https://www.kaggle.com/datasets |
| Dataset Splits | Yes | Regarding training and test distributions, in the binary case, we considered different prevalences of training positives postrain and test positives postest in the respective sets postrain {0.05, 0.1, 0.3, 0.5, 0.7, 0.9} and postest {0, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, following the protocol introduced by Forman (2008). ... In both binary and multiclass settings, we considered splits with relative amounts of training versus test data samples in {(0.1, 0.9), (0.3, 0.7), (0.5, 0.5), (0.7, 0.3)}, thereby simulating scenarios in which we have little as well as relatively much data at hand to train our models. |
| Hardware Specification | No | The authors acknowledge support by the state of Baden-Württemberg through the bw HPC and the German Research Foundation (DFG) through grant INST 35/1597-1 FUGG. This mentions an HPC resource but does not provide specific hardware details such as CPU or GPU models. |
| Software Dependencies | No | Except for the SVMperf-based quantifiers and quantification forests, all algorithms were implemented from scratch in Python 3, using scikit-learn as base implementation for the underlying classifiers and the package cvxpy (Diamond and Boyd, 2016) to solve constrained optimization problems. The versions for `scikit-learn` and `cvxpy` are not specified. |
| Experiment Setup | Yes | In our main experiments, we chose the following hyperparameters for the quantifiers: As mentioned above, for all methods that use a classifier to perform quantification, we used the logistic regression classifier with the default L-BFGS solver along with its built-in probability estimator provided by scikit-learn and set the number of maximum iterations at 1000. We always used stratified 10-fold cross-validation on the training set when estimating the misclassification rates or computing the set of scores and thresholds that the quantifiers needed. ... For the Dy S framework, including the HDy method, we chose to divide its confidence scores into 10 bins... |