Random Intersection Trees
Authors: Rajen Dinesh Shah, Nicolai Meinshausen
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we give two numerical examples to provide further insight into the performance of our method. The first is about learning the winning combinations for the well-known game Tic-Tac-Toe. ... The second example concerns text classification. ... Figure 3 shows the misclassification rates under situations with different numbers of added noise variables. |
| Researcher Affiliation | Academia | Rajen Dinesh Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, CB3 0WB, UK; Nicolai Meinshausen EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland |
| Pseudocode | Yes | Algorithm 1 A basic version of Random Intersection Trees; Algorithm 2 Random Intersection Trees with early stopping |
| Open Source Code | No | We are currently working on such a version and plan to make it available soon. |
| Open Datasets | Yes | The Tic-Tac-Toe endgame data set (Matheus and Rendell, 1989; Aha et al., 1991) contains all possible winning end states of the game Tic-Tac-Toe... The Reuters RCV1 text data contain the tf-idf (term frequency-inverse document frequency) weighted presence of 47, 148 word-stems in each document; for details on the collection and processing of the original data, see Lewis et al. (2004). |
| Dataset Splits | Yes | We use half of the observations for training, and the other half for testing. ... we divide the documents into a training and test set with the first batch of 23, 149 documents as training and the following 30000 documents as test documents. |
| Hardware Specification | No | No specific hardware details for running the experiments (e.g., GPU/CPU models, memory) are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'pure R (R Core Team, 2013)' and algorithms like 'CART algorithm' and 'Random Forests', but does not provide specific version numbers for software libraries or solvers used, other than the implied version for R from its citation year. |
| Experiment Setup | Yes | We create two min-wise hash tables from the available observations in each of the classes, taking L = 200. ... 1000 iterations of Random Intersection Trees (with B = 5 samples as branching factor in each tree) that were selected by at least two trees. ... with a cut-off value θ0 = (3/20)pc and all remaining patterns S with a length less than or equal to 4 are retained. ... We generate 100 trees as in the Random Forests method: each is fit to subsampled training data using CART algorithm restricted to depth 4... |