Scikit-Multiflow: A Multi-output Streaming Framework

Authors: Jacob Montiel, Jesse Read, Albert Bifet, Talel Abdessalem

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical scikit-multiflow is a framework for learning from data streams and multi-output learning in Python. Conceived to serve as a platform to encourage the democratization of stream learning research, it provides multiple state-of-the-art learning methods, data generators and evaluators for different stream learning problems, including single-output, multi-output and multi-label.
Researcher Affiliation Academia Jacob Montiel EMAIL LTCI, T el ecom Paris Tech, Universit e Paris-Saclay Paris, FRANCE Jesse Read EMAIL LIX, Ecole Polytechnique Palaiseau, FRANCE Albert Bifet EMAIL LTCI, T el ecom Paris Tech, Universit e Paris-Saclay Paris, FRANCE Talel Abdessalem EMAIL LTCI, T el ecom Paris Tech, Universit e Paris-Saclay Paris, FRANCE UMI CNRS IPAL, National University of Singapore
Pseudocode Yes The sequence to train a Stream Model and track its performance using prequential evaluation in scikit-multiflow is outlined in Figure 1. Figure 1: Training and testing a stream model using scikit-multiflow. This sequence corresponds to prequential evaluation. [while there is data in the stream] 1 : evaluate(stream, model) 2 : get next sample 3 : X, y_true = next sample 4 : predict(X) 5 : y_predicted = Prediction 6 : results = evaluate(y_true, y_predicted) 7 [m samples passed] : update_metrics(last_result) 8 [m samples passed] : update_plot(last_result) 9 : partial_fit(X) 10 : trained model
Open Source Code Yes The source code is available at https://github.com/scikit-multiflow/scikit-multiflow. The source code of the package is publicly available on Github at https://github.com/scikit-multiflow/scikit-multiflow
Open Datasets No The paper describes 'Stream generators' such as Agrawal, Hyperplane, Led, Mixed, Random RBF, etc., which are features of the scikit-multiflow framework for generating data streams. However, it does not explicitly state the use of, or provide access information for, any specific publicly available datasets for experimental evaluation within this paper.
Dataset Splits No The paper describes a software framework and its capabilities, including 'Available evaluators correspond to Prequential and Hold-Out evaluations'. However, it does not detail specific experiments conducted with datasets or specify any training/test/validation splits for reproduction.
Hardware Specification No The paper describes a software framework and its functionalities. It does not provide any specific details about the hardware used for development or for running any potential experiments.
Software Dependencies No scikit-multiflow builds upon popular open source frameworks including scikitlearn, MOA and MEKA. Development follows the FOSS principles. scikit-learn (Pedregosa et al., 2011) is the most popular open source software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forest, gradient boosting, k-means and DBSCAN, and is designed to inter-operate with the Python numerical and scientific packages Num Py and Sci Py. The paper mentions several software frameworks and programming languages but does not provide specific version numbers for any of them (e.g., Python version, scikit-learn version, etc.).
Experiment Setup No The paper focuses on describing the scikit-multiflow framework, its architecture, and available methods and evaluators. It does not include a section detailing specific experimental setups, hyperparameters, or training configurations for any models or experiments.