River: machine learning for streaming data in Python
Authors: Jacob Montiel, Max Halford, Saulo Martiello Mastelini, Geoffrey Bolmier, Raphael Sourty, Robin Vaysse, Adil Zouitine, Heitor Murilo Gomes, Jesse Read, Talel Abdessalem, Albert Bifet
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark the implementation of 3 algorithms2 available in scikit-learn (Pedregosa et al., 2011), Creme and scikit-multiflow: Gaussian Naive Bayes (GNB), Logistic Regression (LR) (Hastie et al., 2009), and Hoeffding Tree (HT) (Hulten et al., 2001). Table 1 shows similar accuracy between implementations (as expected) for all models. Table 2 shows the processing time (learn and predict). River models perform at least as fast but overall faster than the rest. Tests are performed on the Elec2 data set (Harries and Wales, 1999) which has 45312 samples with 8 numerical features. Reported processing time is the average of running the experiment 7 times on a system with a 2.4 GHz Quad-Core Intel Core i5 processor and 16GB of RAM. |
| Researcher Affiliation | Collaboration | Jacob Montiel EMAIL AI Institute, University of Waikato, Hamilton, New Zealand LTCI, T el ecom Paris, Institut Polytechnique de Paris, Palaiseau, France Max Halford EMAIL Alan, Paris, France Saulo Martiello Mastelini EMAIL Institute of Mathematics and Computer Sciences, University of S ao Paulo, S ao Carlos, Brazil Geoffrey Bolmier EMAIL Volvo Car Corporation, G oteborg, Sweden Raphael Sourty EMAIL IRIT, Universit e Paul Sabatier, Toulouse, France Renault, Paris, France Robin Vaysse EMAIL IRIT, Universit e Paul Sabatier, Toulouse, France Octogone Lordat, Universit e Jean-Jaures, Toulouse, France Adil Zouitine EMAIL IRT Saint Exup ery, Toulouse, France Heitor Murilo Gomes EMAIL AI Institute, University of Waikato, Hamilton, New Zealand Jesse Read EMAIL LIX, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France Talel Abdessalem EMAIL LTCI, T el ecom Paris, Institut Polytechnique de Paris, Palaiseau, France Albert Bifet EMAIL AI Institute, University of Waikato, Hamilton, New Zealand LTCI, T el ecom Paris, Institut Polytechnique de Paris, Palaiseau, France |
| Pseudocode | No | The paper includes code examples for demonstrating the library's usage, but it does not contain structured pseudocode or algorithm blocks (e.g., a section explicitly labeled 'Algorithm' or 'Pseudocode'). |
| Open Source Code | Yes | The source code is available at https://github.com/online-ml/river. |
| Open Datasets | Yes | Tests are performed on the Elec2 data set (Harries and Wales, 1999) which has 45312 samples with 8 numerical features. |
| Dataset Splits | No | The paper mentions the Elec2 dataset and its size (45312 samples), but it does not provide specific details on how this dataset was split into training, validation, or test sets. |
| Hardware Specification | Yes | Reported processing time is the average of running the experiment 7 times on a system with a 2.4 GHz Quad-Core Intel Core i5 processor and 16GB of RAM. |
| Software Dependencies | No | The paper mentions Python and Cython as implementation languages and refers to scikit-learn, Creme, scikit-multiflow, and pandas.DataFrame, but it does not specify explicit version numbers for these software components used in the experiments. |
| Experiment Setup | No | The paper describes benchmarking algorithms and reporting their accuracy and processing time. However, it does not explicitly provide specific experimental setup details such as hyperparameters (e.g., learning rate, regularization strength, tree depth) or other training configurations for the algorithms benchmarked (GNB, LR, HT). |