OpenML-Python: an extensible Python API for OpenML
Authors: Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce Open ML-Python, a client API for Python, which opens up the Open ML platform for a wide range of Python-based machine learning tools. It provides easy access to all datasets, tasks and experiments on Open ML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to Open ML, and reproduce results which are stored on Open ML. Furthermore, it comes with a scikit-learn extension and an extension mechanism to easily integrate other machine learning libraries written in Python into the Open ML ecosystem. |
| Researcher Affiliation | Collaboration | Matthias Feurer EMAIL University of Freiburg, Freiburg, Germany Jan N. van Rijn EMAIL Leiden University, Leiden, Netherlands Arlind Kadra EMAIL University of Freiburg, Freiburg, Germany Pieter Gijsbers EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Neeratyoy Mallik EMAIL University of Freiburg, Freiburg, Germany Sahithya Ravi EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Andreas M uller EMAIL Microsoft, Sunnyvale, USA Joaquin Vanschoren EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Frank Hutter EMAIL University of Freiburg & Bosch Center for Artificial Intelligence, Freiburg, Germany |
| Pseudocode | No | The paper provides Python code examples in Figure 1 and Figure 2 to demonstrate the API's usage, but these are concrete code snippets, not pseudocode or abstract algorithm blocks. |
| Open Source Code | Yes | Source code and documentation are available at https://github.com/openml/openml-python/. |
| Open Datasets | Yes | Open ML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. It goes beyond open data repositories, such as UCI (Dua and Graff, 2019), PMLB (Olson et al., 2017), the datasets submodules in scikit-learn and tensorflow (Pedregosa et al., 2011; Abadi et al., 2016)... |
| Dataset Splits | Yes | To facilitate contributions from the community, it allows people to upload new datasets in only two function calls, and to define new tasks on them (combinations of a dataset, train/test split and target attribute). For instance, an experiment (run) shared on Open ML can show how a random forest (flow) performs on Iris (dataset) if evaluated with 10-fold cross-validation (task), and how to reproduce that result. |
| Hardware Specification | No | The paper describes a software API and provides code examples for its use. It does not mention any specific hardware (e.g., GPU/CPU models, memory, processor types) used by the authors to conduct the work or generate the examples presented in the paper. |
| Software Dependencies | No | The package is written in Python3 and open-sourced with a 3-Clause BSD License. For instance, we build documentation using the popular sphinx Python documentation generator, use an extension to automatically compile examples into documentation and Jupyter notebooks, and employ standard open-source packages for scientific computing such as numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), and pandas (Mc Kinney, 2010). The paper mentions Python3 and names several libraries with citations, but does not provide specific version numbers for these key software components. Python3 is a general version, not a specific one like 3.x, and the citations refer to the general publication of the libraries, not the specific versions used. |
| Experiment Setup | No | The paper presents an API and provides Python code examples demonstrating its use (Figures 1 and 2). While Figure 1 shows retrieval of results for an SVM with 'C' and 'gamma' parameters, these are existing evaluations from the Open ML server. Figure 2 demonstrates building a classification pipeline with a Decision Tree Classifier but does not specify concrete hyperparameters or system-level training settings for an experiment conducted by the authors in this paper. |