Pycobra: A Python Toolbox for Ensemble Learning and Visualisation

Authors: Benjamin Guedj, Bhargav Srinivasa Desikan

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce pycobra, a Python library devoted to ensemble learning (regression and classification) and visualisation. Its main assets are the implementation of several ensemble learning algorithms, a flexible and generic interface to compare and blend any existing machine learning algorithm available in Python libraries (as long as a predict method is given), and visualisation tools such as Voronoi tessellations. pycobra allows the user to gauge the performance of the preliminary predictors used in the aggregation, with built-in methods to easily plot boxplots and QQ-plots. A salient feature of pycobra is using Voronoi tessellations for generic visualisation. Figure 1: Assessing the performance of regression machines and COBRA.
Researcher Affiliation Academia Benjamin Guedj EMAIL Modal project-team, Lille Nord Europe research center Inria, France Bhargav Srinivasa Desikan EMAIL Modal project-team, Lille Nord Europe research center Inria, France
Pseudocode Yes Algorithm 1 presents the pseudo-code of the COBRA implementation. All the pycobra estimators are scikit-learn compatible and can be used as part of the existing scikit-learn ecosystem, such as [Grid Search CV] and [Pipeline]. While hyperparameter initialisation is systematically done using scikit-learn s Grid Search CV, pycobra s Diagnostics class allows us to compare between different combinations of the constituent predictors and data-splitting, among other basic parameters. Algorithm 1: The original COBRA algorithm from Biau et al. (2016).
Open Source Code Yes pycobra is fully scikit-learn compatible and is released under the MIT open-source license. pycobra can be downloaded from the Python Package Index (Py Pi) and Machine Learning Open Source Software (MLOSS). The current version (along with Jupyter notebooks, extensive documentation, and continuous integration tests) is available at https://github.com/bhargavvader/pycobra and official documentation website is https://modal.lille.inria.fr/pycobra.
Open Datasets No The paper describes a software library and its features, but does not provide concrete access information for any specific dataset used for evaluation or examples within the paper itself. It discusses the use of scikit-learn implementations but does not specify any particular datasets used for experiments or their public availability.
Dataset Splits No The paper describes a software library and its features. While it mentions that pycobra's Diagnostics class allows comparison between different data-splitting parameters, it does not provide specific dataset split information (percentages, sample counts, or predefined splits) for any experiments presented in the paper. No specific datasets are used or split for results presented in the paper.
Hardware Specification No The paper describes a software library and its features but does not provide any specific hardware details (GPU/CPU models, processor types, or memory amounts) used for its development or for running any presented examples or experiments.
Software Dependencies No The paper mentions several software dependencies like NumPy (Walt et al., 2011), scikit-learn (Pedregosa et al., 2011), Matplotlib (Hunter, 2007), SciPy (Jones et al., 2001), and Jupyter IPython notebooks (Perez and Granger, 2007), but it does not provide specific version numbers for these components, which is required for reproducibility.
Experiment Setup No The paper describes the pycobra library and its functionalities. While it mentions hyperparameter initialisation using scikit-learn's Grid Search CV as a feature of pycobra, it does not provide concrete hyperparameter values, training configurations, or system-level settings for any specific experiments or results presented in the paper itself.