Confidence Decision Trees via Online and Active Learning for Streaming Data
Authors: Rocco De Rosa, Nicolò Cesa-Bianchi
JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by state-of-the-art techniques. |
| Researcher Affiliation | Academia | Rocco De Rosa EMAIL Dipartimento di Informatica Universit a degli Studi di Milano 20135 Milano, Italy Nicol o Cesa-Bianchi EMAIL Dipartimento di Informatica Universit a degli Studi di Milano 20135 Milano, Italy |
| Pseudocode | Yes | Algorithm 1 C-Tree Input: Threshold τ > 0... Algorithm 2 Query Strategy... Algorithm 3 Online Stream Validation Protocol... Algorithm 4 Random Strategy... Algorithm 5 Variable Uncertainty Strategy... Algorithm 6 Split Strategy... Algorithm 7 Rand CBT Input: tree T, total number of leaves num-leaves, number of attributes d, leaf class conditional probability q Output: complete binary tree T |
| Open Source Code | No | The paper mentions using and modifying the H-Tree algorithm implemented in MOA, but does not provide a specific link or statement for their own modified code being open-source. |
| Open Datasets | Yes | A9A, COD-RNA and COVERTYPE are from the LIBSVM binary classification repository8. AIRLINES and ELECTRICITY are from the MOA collection9. |
| Dataset Splits | No | The paper describes experiments in a streaming setting using "Interleaved Test-Then-Train validation in MOA" and creating "ten different streams from each dataset ... by taking a random permutation of the examples in it." This implies a continuous, sequential processing of data rather than fixed train/test/validation splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions that the Hoeffding Tree (H-Tree) algorithm is implemented in MOA and that they modified the H-Tree code in MOA, but it does not specify a version number for MOA or any other software dependencies. |
| Experiment Setup | Yes | The grace period parameter7 was set to 100. In contrast to the typical experimental settings in the literature, we did not consider the tie-break parameter because in the experiments we observed that it caused the majority of the splits. Based on Theorem 4 and Remark 1 (which leads to the choice δt = 1t ), we used the following version of our confidence bounds εKM and εGini (the bound for εent contains an extra ln m factor), eεKM = eεGini = c 1 m ln m2h2td (7) where the parameter c is used to control the number of splits. ...The parameters δ in H-Tree and Corr H-Tree, and c in C-Tree ... were individually tuned on each dataset using a grid of 200 values, hence plots show the online performance of each algorithm when it is close to be optimally tuned. The ranges for the parameters are c (0, 2) and δ (0, 1). The optimal values of c and δ generated in the tuning phase are, respectively, typically around 5 10−3 and 10−2. ... As explained by Zliobaite et al. (2014), the parameter s can be safely set to a default value 0.01. We performed all the experiments with this setting. ... In the experiments we set the parameter ν = 0.2. |