Quantifying Uncertainty in Online Regression Forests

Authors: Theodore Vasiloudis, Gianmarco De Francisci Morales, Henrik Boström

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evidence that the methods outperform the state-of-the-art in terms of maintaining error guarantees, while being an order of magnitude faster. We also investigate how the algorithms are able to recover from concept drift. Keywords: Online learning, Uncertainty, Decision Trees, Regression... We report on an extensive experimental evaluation with thirty datasets of various sizes, including datasets that exhibit concept drift, and compare the proposed approaches against both a simple baseline and a state-of-the-art algorithm (Section 4).
Researcher Affiliation Academia Theodore Vasiloudis EMAIL RISE SICS Stockholm, Sweden Gianmarco De Francisci Morales EMAIL ISI Foundation Torino, Italy Henrik Bostr om EMAIL KTH Royal Institute of Technology Stockholm, Sweden
Pseudocode Yes Algorithm 1: Online CP Training (ℓ, (Xi, yi)), λ Algorithm 2: Online CP Interval Prediction (ℓ, Xi, C, α) Algorithm 3: Update Non-conformity Scores (ℓ, C) Algorithm 4: Online QRF Training (ℓ, (Xi, yi)), λ Algorithm 5: Online QRF Interval Prediction (ℓ, Xi, α)
Open Source Code Yes 1. All algorithm implementations, data, and experiment automation scripts used in this study are available as open source to ensure reproducibility at https://github.com/thvasilo/ uncertain-trees-reproducible
Open Datasets Yes We use 20 small-scale datasets from a variety of domains to test the validity and efficiency of the algorithms, and 10 data sets to test their ability to deal with concept drift. The small-scale datasets are gathered from the Open ML repository (Vanschoren et al., 2013)... We also use a real-world dataset of flight delays. 2 This dataset contains flight arrival and departure details for all commercial flights in the US during 2008...
Dataset Splits No The evaluation follows the prequential evaluation design that is commonly employed in online learning settings (Gama et al., 2009). For each example in the dataset, we first make a prediction, then update our metrics based on the predicted and true labels, and finally, reveal the true label to the algorithm. When reporting results over time for the concept drift datasets, we use tumbling windows of size 10 000, that is, we measure the mean performance of each algorithm for every 10 000 samples.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, only noting that one of the baselines (Mondrian Forest) was implemented in Cython using Numpy primitives for performance.
Software Dependencies No We have implemented these algorithms in the MOA online learning framework (Bifet et al., 2010a). For the Mondrian Forest, we use the optimized implementation available in the scikit-garden 4 Python library. We have verified with the author of the original work that this implementation produces correct results in online learning mode for the parameter settings we use.
Experiment Setup Yes For all the experiments, we use an ensemble size of 10. We use |C| = 1000 calibration examples for the conformal prediction algorithms, while for Online QRF we use quantile sketches at the leaves with the default accuracy parameter K = 200, as recommended by Karnin et al., which yields a normalized rank error of 1.65%. We set the requested significance level α to 0.1 for the validity experiments, but explore the effect of different significance levels in Section 4.5. Specifically, we set min samples split = 2, which determines the minimum number of samples necessary for a leaf to be considered for a split.