reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn

Authors: Fábio M. Miranda, Niklas Köhnecke, Bernhard Y. Renard

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 2, we compare the hierarchical F-score, computational resources (measured with the command time) and disk usage. This comparison was performed between two flat classifiers from the library scikit-learn and Microsoft's Light GBM (Ke et al., 2017) versus the local hierarchical classifiers implemented in Hi Class. In order to avoid bias, cross-validation and hyperparameter tuning were performed on the local hierarchical classifiers and flat classifiers. For comparison purposes, we used a snapshot from 02/11/2022 of the consumer complaints data set provided by the Consumer Financial Protection Bureau of the United States (Bureau and General, 2022).
Researcher Affiliation	Academia	Fabio M. Miranda EMAIL Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany Niklas K ohnecke EMAIL Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany Bernhard Y. Renard EMAIL Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany
Pseudocode	No	The paper describes the algorithms for Local Classiﬁer Per Node, Local Classiﬁer Per Parent Node, and Local Classiﬁer Per Level in Appendix C using descriptive text and figures, and defines training policies in tables, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code and documentation are available at https://github.com/scikit-learn-contrib/hiclass.
Open Datasets	Yes	For comparison purposes, we used a snapshot from 02/11/2022 of the consumer complaints data set provided by the Consumer Financial Protection Bureau of the United States (Bureau and General, 2022), which after preprocessing contained 727,495 instances for cross-validation and hyperparameter tuning as well as training and 311,784 more for validation.
Dataset Splits	Yes	First the data set was split with 70% of the data being used for hyperparameter tuning and training, while 30% was held for a final evaluation. The subset with 70% of data held for training was further split into 5 subsets for 5-fold cross-validation and identification of best hyperparameter combination.
Hardware Specification	Yes	The benchmark was computed on multiple cluster nodes running GNU/Linux with 512 GB physical memory and 128 cores provided by two AMD EPYC 7742 processors.
Software Dependencies	No	The paper mentions 'Packages for Python 3.7-3.9' and various libraries like 'scikit-learn', 'NumPy', 'NetworkX', 'Ray', 'Joblib', 'Hydra', 'Optuna', and 'Light GBM'. While Python has a version range, specific version numbers for the other key software components used in the methodology are not explicitly provided in the text.
Experiment Setup	Yes	For hyperparameter tuning, the models were trained using 4 folds as training data and validated on the remaining one. This process was repeated 5 times, with different folds combinations being used in each iteration, and the average hierarchical F-score was reported as the performance metric. The selection of the best hyperparameters was assisted by Hydra (Meta, 2022) and its plugin Optuna (Akiba et al., 2019), through a grid search using the combinations of hyperparameters described in Tables 2-4.