On the Bayes-Optimality of F-Measure Maximizers

Authors: Willem Waegeman, Krzysztof Dembczyński, Arkadiusz Jachnik, Weiwei Cheng, Eyke Hüllermeier

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. ... In Section 7, we present extensive experimental results to illustrate the practical usefulness of our findings. More specifically, all examined methods are compared for a series of multi-label classification problems.
Researcher Affiliation Collaboration Willem Waegeman EMAIL Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University, Ghent 9000 Belgium ... Weiwei Cheng EMAIL Amazon Development Center Germany, Berlin 10707 Germany
Pseudocode Yes Algorithm 1 General F-measure Maximizer
Open Source Code No The paper mentions external software for comparison: "The results were obtained by using the software available at http://users.cecs.anu.edu.au/~jpetterson/." However, it does not provide an explicit statement or link for the code implementing the methodology described by the authors in this paper (e.g., the GFM algorithm).
Open Datasets Yes We test some of the algorithms described above on four commonly used multi-label benchmark data sets with known training and test sets. We take these data sets from the MULAN8 and Lib SVM9 repositories. [8] This repository can be found at: http://mulan.sourceforge.net/datasets.html. [9] This repository can be found at: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html.
Dataset Splits Yes We test some of the algorithms described above on four commonly used multi-label benchmark data sets with known training and test sets. ... We use 5-fold cross-validation and choose the regularization parameter from the following set of possible values {10 4, 10 3, . . . , 103}. ... This is a minor difference in comparison to the competition results, which are computed over 90% of test examples. The remaining 10% of test examples constitute a validation set that served for computing the scores for the leader board during the competition.
Hardware Specification Yes We run these simulations, as well as the other experiments described later in this paper, on a Debian virtual machine with 8-core x64 processor and 5GB RAM.
Software Dependencies No The paper mentions using "Weka (Hall et al., 2009)", "Mulan (Tsoumakas et al., 2011)", and "Mallet (Mc Callum, 2002)". However, specific version numbers for these software packages or programming languages are not provided.
Experiment Setup Yes We use a different number of nearest neighbors, l {10, 20, 50, 100}. ... We use 5-fold cross-validation and choose the regularization parameter from the following set of possible values {10 4, 10 3, . . . , 103}. ... The maximal number of iterations in the cutting-plane algorithm has been set to 1000.