reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Modelling Interactions in High-dimensional Data with Backtracking

Authors: Rajen D. Shah

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The eﬀectiveness of our method when applied to regression and classiﬁcation problems is demonstrated on simulated and real data sets.
Researcher Affiliation	Academia	Rajen D. Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, CB3 0WB, UK
Pseudocode	Yes	Algorithm 1 A naive version of Backtracking with the Lasso
Open Source Code	Yes	An R (R Development Core Team, 2005) package for the method is available on the author s website.
Open Datasets	Yes	This data set available at http://archive.ics.uci.edu/ml/datasets/Communities+and+ Crime+Unnormalized contains crime statistics... This data set is available from http://archive.ics.uci.edu/ml/datasets/ISOLET; see Fanty and Cole (1991) for more background on the data.
Dataset Splits	Yes	To evaluate the procedures, we randomly selected 2/3 for training and the remaining 1/3 was used for testing. This was repeated 200 times for each of the data sets.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments. It mentions computational efficiency and speeds, but no specific GPU/CPU models or other hardware details.
Software Dependencies	No	The paper mentions 'An R (R Development Core Team, 2005) package' and references other methods like 'hier Net (Bien et al., 2013)' and 'MARS (Friedman, 1991) (implemented using Hastie et al. (2013))', but it does not provide specific version numbers for software dependencies used to replicate the experiments, beyond the year for the R reference.
Experiment Setup	Yes	For the iterated Lasso ﬁts, we repeated the following process. Given a design matrix, ﬁrst ﬁt the Lasso. Then apply 5-fold cross-validation to give a λ value and associated active set. Finally add all interactions between variables in this active set to the design matrix, ready for the next iteration... To select the tuning parameters of the methods we used cross-validation randomly selection 5 folds but repeating this a total of 5 times to reduce the variance of the cross-validation scores. When using Backtracking, the size of the active set was restricted to 50 and the size of Ck to p + 50 49/2 = 1225, so T was at most 50. For Main, Backtracking, Iterates, Screening and hier Net, we employed 5-fold cross-validation with squared error loss to select tuning parameters. For MARS we used the default settings for pruning the ﬁnal ﬁts using generalised cross-validation. With Random Forests, we used the default settings on both data sets. In all of the methods except Random Forests, we only included ﬁrst-order interactions.