reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantifying Uncertainty in Online Regression Forests

Authors: Theodore Vasiloudis, Gianmarco De Francisci Morales, Henrik Boström

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence that the methods outperform the state-of-the-art in terms of maintaining error guarantees, while being an order of magnitude faster. We also investigate how the algorithms are able to recover from concept drift. Keywords: Online learning, Uncertainty, Decision Trees, Regression... We report on an extensive experimental evaluation with thirty datasets of various sizes, including datasets that exhibit concept drift, and compare the proposed approaches against both a simple baseline and a state-of-the-art algorithm (Section 4).
Researcher Affiliation	Academia	Theodore Vasiloudis EMAIL RISE SICS Stockholm, Sweden Gianmarco De Francisci Morales EMAIL ISI Foundation Torino, Italy Henrik Bostr om EMAIL KTH Royal Institute of Technology Stockholm, Sweden
Pseudocode	Yes	Algorithm 1: Online CP Training (ℓ, (Xi, yi)), λ Algorithm 2: Online CP Interval Prediction (ℓ, Xi, C, α) Algorithm 3: Update Non-conformity Scores (ℓ, C) Algorithm 4: Online QRF Training (ℓ, (Xi, yi)), λ Algorithm 5: Online QRF Interval Prediction (ℓ, Xi, α)
Open Source Code	Yes	1. All algorithm implementations, data, and experiment automation scripts used in this study are available as open source to ensure reproducibility at https://github.com/thvasilo/ uncertain-trees-reproducible
Open Datasets	Yes	We use 20 small-scale datasets from a variety of domains to test the validity and eﬃciency of the algorithms, and 10 data sets to test their ability to deal with concept drift. The small-scale datasets are gathered from the Open ML repository (Vanschoren et al., 2013)... We also use a real-world dataset of ﬂight delays. 2 This dataset contains ﬂight arrival and departure details for all commercial ﬂights in the US during 2008...
Dataset Splits	No	The evaluation follows the prequential evaluation design that is commonly employed in online learning settings (Gama et al., 2009). For each example in the dataset, we ﬁrst make a prediction, then update our metrics based on the predicted and true labels, and ﬁnally, reveal the true label to the algorithm. When reporting results over time for the concept drift datasets, we use tumbling windows of size 10 000, that is, we measure the mean performance of each algorithm for every 10 000 samples.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, only noting that one of the baselines (Mondrian Forest) was implemented in Cython using Numpy primitives for performance.
Software Dependencies	No	We have implemented these algorithms in the MOA online learning framework (Bifet et al., 2010a). For the Mondrian Forest, we use the optimized implementation available in the scikit-garden 4 Python library. We have veriﬁed with the author of the original work that this implementation produces correct results in online learning mode for the parameter settings we use.
Experiment Setup	Yes	For all the experiments, we use an ensemble size of 10. We use \|C\| = 1000 calibration examples for the conformal prediction algorithms, while for Online QRF we use quantile sketches at the leaves with the default accuracy parameter K = 200, as recommended by Karnin et al., which yields a normalized rank error of 1.65%. We set the requested signiﬁcance level α to 0.1 for the validity experiments, but explore the eﬀect of diﬀerent signiﬁcance levels in Section 4.5. Speciﬁcally, we set min samples split = 2, which determines the minimum number of samples necessary for a leaf to be considered for a split.