reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Bayes-Optimality of F-Measure Maximizers

Authors: Willem Waegeman, Krzysztof Dembczyński, Arkadiusz Jachnik, Weiwei Cheng, Eyke Hüllermeier

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of diﬀerent approaches for maximizing the F-measure. ... In Section 7, we present extensive experimental results to illustrate the practical usefulness of our ﬁndings. More speciﬁcally, all examined methods are compared for a series of multi-label classiﬁcation problems.
Researcher Affiliation	Collaboration	Willem Waegeman EMAIL Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University, Ghent 9000 Belgium ... Weiwei Cheng EMAIL Amazon Development Center Germany, Berlin 10707 Germany
Pseudocode	Yes	Algorithm 1 General F-measure Maximizer
Open Source Code	No	The paper mentions external software for comparison: "The results were obtained by using the software available at http://users.cecs.anu.edu.au/~jpetterson/." However, it does not provide an explicit statement or link for the code implementing the methodology described by the authors in this paper (e.g., the GFM algorithm).
Open Datasets	Yes	We test some of the algorithms described above on four commonly used multi-label benchmark data sets with known training and test sets. We take these data sets from the MULAN8 and Lib SVM9 repositories. [8] This repository can be found at: http://mulan.sourceforge.net/datasets.html. [9] This repository can be found at: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html.
Dataset Splits	Yes	We test some of the algorithms described above on four commonly used multi-label benchmark data sets with known training and test sets. ... We use 5-fold cross-validation and choose the regularization parameter from the following set of possible values {10 4, 10 3, . . . , 103}. ... This is a minor diﬀerence in comparison to the competition results, which are computed over 90% of test examples. The remaining 10% of test examples constitute a validation set that served for computing the scores for the leader board during the competition.
Hardware Specification	Yes	We run these simulations, as well as the other experiments described later in this paper, on a Debian virtual machine with 8-core x64 processor and 5GB RAM.
Software Dependencies	No	The paper mentions using "Weka (Hall et al., 2009)", "Mulan (Tsoumakas et al., 2011)", and "Mallet (Mc Callum, 2002)". However, specific version numbers for these software packages or programming languages are not provided.
Experiment Setup	Yes	We use a diﬀerent number of nearest neighbors, l {10, 20, 50, 100}. ... We use 5-fold cross-validation and choose the regularization parameter from the following set of possible values {10 4, 10 3, . . . , 103}. ... The maximal number of iterations in the cutting-plane algorithm has been set to 1000.