scikit-multilearn: A Python library for Multi-Label Classification
Authors: Piotr Szymański, Tomasz Kajdanowicz
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have tested MEKA, MULAN and scikit-multilearn on 12 well-cited benchmark multi-label classification datasets using two comparison scenarios: Binary Relevance and Label Powerset. The two selected problem transformation approaches are widely used both in regular classification tasks and as the base for more sophisticated methods. We did not test algorithm adaptation methods as there are no algorithm adaptation methods present in all three libraries. We use these methods to illustrate two aspects of the classification performance of the libraries: the cost of using many classifiers with splitting operations performed on the label space matrix and the cost of using a single classifier which requires to access all label combinations to perform the transformation. |
| Researcher Affiliation | Academia | Piotr Szymański piotr.szymanski@{pwr.edu.pl,illimites.edu.pl} Department of Computational Intelligence Wrocław University of Science and Technology Wrocław, Poland illimites foundation Wrocław, Poland Tomasz Kajdanowicz EMAIL Department of Computational Intelligence Wrocław University of Science and Technology Wrocław, Poland |
| Pseudocode | No | The paper describes methods and algorithms conceptually within the text, but does not include any clearly labeled pseudocode blocks, algorithm figures, or structured code-like procedures. |
| Open Source Code | Yes | Source code and documentation can be downloaded from http://scikit.ml and also via pip. The project is BSD-licensed. (...) Development is managed on the Git Hub repository scikit-multilearn5 |
| Open Datasets | No | The paper mentions: "We have tested MEKA, MULAN and scikit-multilearn on 12 well-cited benchmark multi-label classification datasets". However, it does not provide specific names of these datasets, their sources, links, or citations, making it impossible to concretely access them. |
| Dataset Splits | No | The paper mentions using "12 well-cited benchmark multi-label classification datasets" but does not specify any training, testing, or validation splits, nor does it refer to a standard split or cross-validation setup for these datasets. |
| Hardware Specification | No | The paper states: "All the libraries were forced to use a single core using the taskset command to minimize parallelization effects on the comparison." and "All results taken into consideration reported that 100% of their CPU core had been assigned to the process which performed the classification scenario.". However, it does not specify the model or type of CPU, GPU, or any other hardware component used for the experiments. |
| Software Dependencies | Yes | The benchmark is performed with scikit-multilearn 0.1.0, MEKA 1.9.2, MULAN 1.5.0, scikit-learn 0.19.2, Octave 4.2.2, shogun 6.1.3, MLC Toolbox for Matlab/Octave code from Git Hub master commit hash e798779, R 3.4.1, utiml 0.1.2. (...) All commits undergo continuous testing on Windows, Ubuntu and Mac OS X, both under Python 2.7 and 3.3. |
| Experiment Setup | Yes | To minimize the impact of base classifiers, we have decided to use a fast Random Forest base classifier with 10 trees. As Octave does not provide Matlab s random forest implementation, we used the one provided by the shogun toolbox (Sonnenburg et al., 2017). We have checked the classification quality and did not find significant differences between Hamming Loss, Jaccard and Accuracy scores between the outputs. |