Modelling Interactions in High-dimensional Data with Backtracking
Authors: Rajen D. Shah
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of our method when applied to regression and classification problems is demonstrated on simulated and real data sets. |
| Researcher Affiliation | Academia | Rajen D. Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, CB3 0WB, UK |
| Pseudocode | Yes | Algorithm 1 A naive version of Backtracking with the Lasso |
| Open Source Code | Yes | An R (R Development Core Team, 2005) package for the method is available on the author s website. |
| Open Datasets | Yes | This data set available at http://archive.ics.uci.edu/ml/datasets/Communities+and+ Crime+Unnormalized contains crime statistics... This data set is available from http://archive.ics.uci.edu/ml/datasets/ISOLET; see Fanty and Cole (1991) for more background on the data. |
| Dataset Splits | Yes | To evaluate the procedures, we randomly selected 2/3 for training and the remaining 1/3 was used for testing. This was repeated 200 times for each of the data sets. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. It mentions computational efficiency and speeds, but no specific GPU/CPU models or other hardware details. |
| Software Dependencies | No | The paper mentions 'An R (R Development Core Team, 2005) package' and references other methods like 'hier Net (Bien et al., 2013)' and 'MARS (Friedman, 1991) (implemented using Hastie et al. (2013))', but it does not provide specific version numbers for software dependencies used to replicate the experiments, beyond the year for the R reference. |
| Experiment Setup | Yes | For the iterated Lasso fits, we repeated the following process. Given a design matrix, first fit the Lasso. Then apply 5-fold cross-validation to give a λ value and associated active set. Finally add all interactions between variables in this active set to the design matrix, ready for the next iteration... To select the tuning parameters of the methods we used cross-validation randomly selection 5 folds but repeating this a total of 5 times to reduce the variance of the cross-validation scores. When using Backtracking, the size of the active set was restricted to 50 and the size of Ck to p + 50 49/2 = 1225, so T was at most 50. For Main, Backtracking, Iterates, Screening and hier Net, we employed 5-fold cross-validation with squared error loss to select tuning parameters. For MARS we used the default settings for pruning the final fits using generalised cross-validation. With Random Forests, we used the default settings on both data sets. In all of the methods except Random Forests, we only included first-order interactions. |