reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OpenML-Python: an extensible Python API for OpenML

Authors: Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce Open ML-Python, a client API for Python, which opens up the Open ML platform for a wide range of Python-based machine learning tools. It provides easy access to all datasets, tasks and experiments on Open ML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to Open ML, and reproduce results which are stored on Open ML. Furthermore, it comes with a scikit-learn extension and an extension mechanism to easily integrate other machine learning libraries written in Python into the Open ML ecosystem.
Researcher Affiliation	Collaboration	Matthias Feurer EMAIL University of Freiburg, Freiburg, Germany Jan N. van Rijn EMAIL Leiden University, Leiden, Netherlands Arlind Kadra EMAIL University of Freiburg, Freiburg, Germany Pieter Gijsbers EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Neeratyoy Mallik EMAIL University of Freiburg, Freiburg, Germany Sahithya Ravi EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Andreas M uller EMAIL Microsoft, Sunnyvale, USA Joaquin Vanschoren EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Frank Hutter EMAIL University of Freiburg & Bosch Center for Artiﬁcial Intelligence, Freiburg, Germany
Pseudocode	No	The paper provides Python code examples in Figure 1 and Figure 2 to demonstrate the API's usage, but these are concrete code snippets, not pseudocode or abstract algorithm blocks.
Open Source Code	Yes	Source code and documentation are available at https://github.com/openml/openml-python/.
Open Datasets	Yes	Open ML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. It goes beyond open data repositories, such as UCI (Dua and Graﬀ, 2019), PMLB (Olson et al., 2017), the datasets submodules in scikit-learn and tensorﬂow (Pedregosa et al., 2011; Abadi et al., 2016)...
Dataset Splits	Yes	To facilitate contributions from the community, it allows people to upload new datasets in only two function calls, and to deﬁne new tasks on them (combinations of a dataset, train/test split and target attribute). For instance, an experiment (run) shared on Open ML can show how a random forest (ﬂow) performs on Iris (dataset) if evaluated with 10-fold cross-validation (task), and how to reproduce that result.
Hardware Specification	No	The paper describes a software API and provides code examples for its use. It does not mention any specific hardware (e.g., GPU/CPU models, memory, processor types) used by the authors to conduct the work or generate the examples presented in the paper.
Software Dependencies	No	The package is written in Python3 and open-sourced with a 3-Clause BSD License. For instance, we build documentation using the popular sphinx Python documentation generator, use an extension to automatically compile examples into documentation and Jupyter notebooks, and employ standard open-source packages for scientiﬁc computing such as numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), and pandas (Mc Kinney, 2010). The paper mentions Python3 and names several libraries with citations, but does not provide specific version numbers for these key software components. Python3 is a general version, not a specific one like 3.x, and the citations refer to the general publication of the libraries, not the specific versions used.
Experiment Setup	No	The paper presents an API and provides Python code examples demonstrating its use (Figures 1 and 2). While Figure 1 shows retrieval of results for an SVM with 'C' and 'gamma' parameters, these are existing evaluations from the Open ML server. Figure 2 demonstrates building a classification pipeline with a Decision Tree Classifier but does not specify concrete hyperparameters or system-level training settings for an experiment conducted by the authors in this paper.