OpenML-Python: an extensible Python API for OpenML

Authors: Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce Open ML-Python, a client API for Python, which opens up the Open ML platform for a wide range of Python-based machine learning tools. It provides easy access to all datasets, tasks and experiments on Open ML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to Open ML, and reproduce results which are stored on Open ML. Furthermore, it comes with a scikit-learn extension and an extension mechanism to easily integrate other machine learning libraries written in Python into the Open ML ecosystem.
Researcher Affiliation Collaboration Matthias Feurer EMAIL University of Freiburg, Freiburg, Germany Jan N. van Rijn EMAIL Leiden University, Leiden, Netherlands Arlind Kadra EMAIL University of Freiburg, Freiburg, Germany Pieter Gijsbers EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Neeratyoy Mallik EMAIL University of Freiburg, Freiburg, Germany Sahithya Ravi EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Andreas M uller EMAIL Microsoft, Sunnyvale, USA Joaquin Vanschoren EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Frank Hutter EMAIL University of Freiburg & Bosch Center for Artificial Intelligence, Freiburg, Germany
Pseudocode No The paper provides Python code examples in Figure 1 and Figure 2 to demonstrate the API's usage, but these are concrete code snippets, not pseudocode or abstract algorithm blocks.
Open Source Code Yes Source code and documentation are available at https://github.com/openml/openml-python/.
Open Datasets Yes Open ML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. It goes beyond open data repositories, such as UCI (Dua and Graff, 2019), PMLB (Olson et al., 2017), the datasets submodules in scikit-learn and tensorflow (Pedregosa et al., 2011; Abadi et al., 2016)...
Dataset Splits Yes To facilitate contributions from the community, it allows people to upload new datasets in only two function calls, and to define new tasks on them (combinations of a dataset, train/test split and target attribute). For instance, an experiment (run) shared on Open ML can show how a random forest (flow) performs on Iris (dataset) if evaluated with 10-fold cross-validation (task), and how to reproduce that result.
Hardware Specification No The paper describes a software API and provides code examples for its use. It does not mention any specific hardware (e.g., GPU/CPU models, memory, processor types) used by the authors to conduct the work or generate the examples presented in the paper.
Software Dependencies No The package is written in Python3 and open-sourced with a 3-Clause BSD License. For instance, we build documentation using the popular sphinx Python documentation generator, use an extension to automatically compile examples into documentation and Jupyter notebooks, and employ standard open-source packages for scientific computing such as numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), and pandas (Mc Kinney, 2010). The paper mentions Python3 and names several libraries with citations, but does not provide specific version numbers for these key software components. Python3 is a general version, not a specific one like 3.x, and the citations refer to the general publication of the libraries, not the specific versions used.
Experiment Setup No The paper presents an API and provides Python code examples demonstrating its use (Figures 1 and 2). While Figure 1 shows retrieval of results for an SVM with 'C' and 'gamma' parameters, these are existing evaluations from the Open ML server. Figure 2 demonstrates building a classification pipeline with a Decision Tree Classifier but does not specify concrete hyperparameters or system-level training settings for an experiment conducted by the authors in this paper.