Random Intersection Trees

Authors: Rajen Dinesh Shah, Nicolai Meinshausen

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we give two numerical examples to provide further insight into the performance of our method. The first is about learning the winning combinations for the well-known game Tic-Tac-Toe. ... The second example concerns text classification. ... Figure 3 shows the misclassification rates under situations with different numbers of added noise variables.
Researcher Affiliation Academia Rajen Dinesh Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, CB3 0WB, UK; Nicolai Meinshausen EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland
Pseudocode Yes Algorithm 1 A basic version of Random Intersection Trees; Algorithm 2 Random Intersection Trees with early stopping
Open Source Code No We are currently working on such a version and plan to make it available soon.
Open Datasets Yes The Tic-Tac-Toe endgame data set (Matheus and Rendell, 1989; Aha et al., 1991) contains all possible winning end states of the game Tic-Tac-Toe... The Reuters RCV1 text data contain the tf-idf (term frequency-inverse document frequency) weighted presence of 47, 148 word-stems in each document; for details on the collection and processing of the original data, see Lewis et al. (2004).
Dataset Splits Yes We use half of the observations for training, and the other half for testing. ... we divide the documents into a training and test set with the first batch of 23, 149 documents as training and the following 30000 documents as test documents.
Hardware Specification No No specific hardware details for running the experiments (e.g., GPU/CPU models, memory) are mentioned in the paper.
Software Dependencies No The paper mentions 'pure R (R Core Team, 2013)' and algorithms like 'CART algorithm' and 'Random Forests', but does not provide specific version numbers for software libraries or solvers used, other than the implied version for R from its citation year.
Experiment Setup Yes We create two min-wise hash tables from the available observations in each of the classes, taking L = 200. ... 1000 iterations of Random Intersection Trees (with B = 5 samples as branching factor in each tree) that were selected by at least two trees. ... with a cut-off value θ0 = (3/20)pc and all remaining patterns S with a length less than or equal to 4 are retained. ... We generate 100 trees as in the Random Forests method: each is fit to subsampled training data using CART algorithm restricted to depth 4...