River: machine learning for streaming data in Python

Authors: Jacob Montiel, Max Halford, Saulo Martiello Mastelini, Geoffrey Bolmier, Raphael Sourty, Robin Vaysse, Adil Zouitine, Heitor Murilo Gomes, Jesse Read, Talel Abdessalem, Albert Bifet

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark the implementation of 3 algorithms2 available in scikit-learn (Pedregosa et al., 2011), Creme and scikit-multiflow: Gaussian Naive Bayes (GNB), Logistic Regression (LR) (Hastie et al., 2009), and Hoeffding Tree (HT) (Hulten et al., 2001). Table 1 shows similar accuracy between implementations (as expected) for all models. Table 2 shows the processing time (learn and predict). River models perform at least as fast but overall faster than the rest. Tests are performed on the Elec2 data set (Harries and Wales, 1999) which has 45312 samples with 8 numerical features. Reported processing time is the average of running the experiment 7 times on a system with a 2.4 GHz Quad-Core Intel Core i5 processor and 16GB of RAM.
Researcher Affiliation Collaboration Jacob Montiel EMAIL AI Institute, University of Waikato, Hamilton, New Zealand LTCI, T el ecom Paris, Institut Polytechnique de Paris, Palaiseau, France Max Halford EMAIL Alan, Paris, France Saulo Martiello Mastelini EMAIL Institute of Mathematics and Computer Sciences, University of S ao Paulo, S ao Carlos, Brazil Geoffrey Bolmier EMAIL Volvo Car Corporation, G oteborg, Sweden Raphael Sourty EMAIL IRIT, Universit e Paul Sabatier, Toulouse, France Renault, Paris, France Robin Vaysse EMAIL IRIT, Universit e Paul Sabatier, Toulouse, France Octogone Lordat, Universit e Jean-Jaures, Toulouse, France Adil Zouitine EMAIL IRT Saint Exup ery, Toulouse, France Heitor Murilo Gomes EMAIL AI Institute, University of Waikato, Hamilton, New Zealand Jesse Read EMAIL LIX, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France Talel Abdessalem EMAIL LTCI, T el ecom Paris, Institut Polytechnique de Paris, Palaiseau, France Albert Bifet EMAIL AI Institute, University of Waikato, Hamilton, New Zealand LTCI, T el ecom Paris, Institut Polytechnique de Paris, Palaiseau, France
Pseudocode No The paper includes code examples for demonstrating the library's usage, but it does not contain structured pseudocode or algorithm blocks (e.g., a section explicitly labeled 'Algorithm' or 'Pseudocode').
Open Source Code Yes The source code is available at https://github.com/online-ml/river.
Open Datasets Yes Tests are performed on the Elec2 data set (Harries and Wales, 1999) which has 45312 samples with 8 numerical features.
Dataset Splits No The paper mentions the Elec2 dataset and its size (45312 samples), but it does not provide specific details on how this dataset was split into training, validation, or test sets.
Hardware Specification Yes Reported processing time is the average of running the experiment 7 times on a system with a 2.4 GHz Quad-Core Intel Core i5 processor and 16GB of RAM.
Software Dependencies No The paper mentions Python and Cython as implementation languages and refers to scikit-learn, Creme, scikit-multiflow, and pandas.DataFrame, but it does not specify explicit version numbers for these software components used in the experiments.
Experiment Setup No The paper describes benchmarking algorithms and reporting their accuracy and processing time. However, it does not explicitly provide specific experimental setup details such as hyperparameters (e.g., learning rate, regularization strength, tree depth) or other training configurations for the algorithms benchmarked (GNB, LR, HT).