reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Confidence Decision Trees via Online and Active Learning for Streaming Data

Authors: Rocco De Rosa, Nicolò Cesa-Bianchi

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by state-of-the-art techniques.
Researcher Affiliation	Academia	Rocco De Rosa EMAIL Dipartimento di Informatica Universit a degli Studi di Milano 20135 Milano, Italy Nicol o Cesa-Bianchi EMAIL Dipartimento di Informatica Universit a degli Studi di Milano 20135 Milano, Italy
Pseudocode	Yes	Algorithm 1 C-Tree Input: Threshold τ > 0... Algorithm 2 Query Strategy... Algorithm 3 Online Stream Validation Protocol... Algorithm 4 Random Strategy... Algorithm 5 Variable Uncertainty Strategy... Algorithm 6 Split Strategy... Algorithm 7 Rand CBT Input: tree T, total number of leaves num-leaves, number of attributes d, leaf class conditional probability q Output: complete binary tree T
Open Source Code	No	The paper mentions using and modifying the H-Tree algorithm implemented in MOA, but does not provide a specific link or statement for their own modified code being open-source.
Open Datasets	Yes	A9A, COD-RNA and COVERTYPE are from the LIBSVM binary classiﬁcation repository8. AIRLINES and ELECTRICITY are from the MOA collection9.
Dataset Splits	No	The paper describes experiments in a streaming setting using "Interleaved Test-Then-Train validation in MOA" and creating "ten diﬀerent streams from each dataset ... by taking a random permutation of the examples in it." This implies a continuous, sequential processing of data rather than fixed train/test/validation splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions that the Hoeﬀding Tree (H-Tree) algorithm is implemented in MOA and that they modified the H-Tree code in MOA, but it does not specify a version number for MOA or any other software dependencies.
Experiment Setup	Yes	The grace period parameter7 was set to 100. In contrast to the typical experimental settings in the literature, we did not consider the tie-break parameter because in the experiments we observed that it caused the majority of the splits. Based on Theorem 4 and Remark 1 (which leads to the choice δt = 1t ), we used the following version of our conﬁdence bounds εKM and εGini (the bound for εent contains an extra ln m factor), eεKM = eεGini = c 1 m ln m2h2td (7) where the parameter c is used to control the number of splits. ...The parameters δ in H-Tree and Corr H-Tree, and c in C-Tree ... were individually tuned on each dataset using a grid of 200 values, hence plots show the online performance of each algorithm when it is close to be optimally tuned. The ranges for the parameters are c (0, 2) and δ (0, 1). The optimal values of c and δ generated in the tuning phase are, respectively, typically around 5 10−3 and 10−2. ... As explained by Zliobaite et al. (2014), the parameter s can be safely set to a default value 0.01. We performed all the experiments with this setting. ... In the experiments we set the parameter ν = 0.2.