reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Random Intersection Trees

Authors: Rajen Dinesh Shah, Nicolai Meinshausen

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we give two numerical examples to provide further insight into the performance of our method. The ﬁrst is about learning the winning combinations for the well-known game Tic-Tac-Toe. ... The second example concerns text classiﬁcation. ... Figure 3 shows the misclassiﬁcation rates under situations with diﬀerent numbers of added noise variables.
Researcher Affiliation	Academia	Rajen Dinesh Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, CB3 0WB, UK; Nicolai Meinshausen EMAIL Seminar f ur Statistik ETH Z urich 8092 Z urich, Switzerland
Pseudocode	Yes	Algorithm 1 A basic version of Random Intersection Trees; Algorithm 2 Random Intersection Trees with early stopping
Open Source Code	No	We are currently working on such a version and plan to make it available soon.
Open Datasets	Yes	The Tic-Tac-Toe endgame data set (Matheus and Rendell, 1989; Aha et al., 1991) contains all possible winning end states of the game Tic-Tac-Toe... The Reuters RCV1 text data contain the tf-idf (term frequency-inverse document frequency) weighted presence of 47, 148 word-stems in each document; for details on the collection and processing of the original data, see Lewis et al. (2004).
Dataset Splits	Yes	We use half of the observations for training, and the other half for testing. ... we divide the documents into a training and test set with the ﬁrst batch of 23, 149 documents as training and the following 30000 documents as test documents.
Hardware Specification	No	No specific hardware details for running the experiments (e.g., GPU/CPU models, memory) are mentioned in the paper.
Software Dependencies	No	The paper mentions 'pure R (R Core Team, 2013)' and algorithms like 'CART algorithm' and 'Random Forests', but does not provide specific version numbers for software libraries or solvers used, other than the implied version for R from its citation year.
Experiment Setup	Yes	We create two min-wise hash tables from the available observations in each of the classes, taking L = 200. ... 1000 iterations of Random Intersection Trees (with B = 5 samples as branching factor in each tree) that were selected by at least two trees. ... with a cut-oﬀ value θ0 = (3/20)pc and all remaining patterns S with a length less than or equal to 4 are retained. ... We generate 100 trees as in the Random Forests method: each is ﬁt to subsampled training data using CART algorithm restricted to depth 4...