Lasso Screening Rules via Dual Polytope Projection
Authors: Jie Wang, Peter Wonka, Jieping Ye
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have evaluated our screening rule using synthetic and real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. ... We evaluate our screening rules on synthetic and real data sets from many different applications. In Section 4, the experimental results demonstrate that our rules are more effective in discarding inactive features than existing state-of-the-art screening rules. |
| Researcher Affiliation | Academia | Jie Wang EMAIL Department of Computational Medicine and Bioinformatics University of Michigan Ann Arbor, MI 48109-2218, USA Peter Wonka EMAIL Department of Computer Science and Engineering Arizona State University Tempe, AZ 85287-8809, USA Jieping Ye EMAIL Department of Computational Medicine and Bioinformatics Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2218, USA |
| Pseudocode | No | The paper describes algorithms and methods but does not present them in explicit pseudocode blocks or figures labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | An efficient MATLAB implementation of the EDPP screening rules combined with the solvers from SLEP package (Liu et al., 2009) for both Lasso and group Lasso is available at http://dpc-screening.github.io/. |
| Open Datasets | Yes | We evaluate our screening rules on synthetic and real data sets from many different applications. ... a) the Prostate Cancer (Petricoin et al., 2002); b) the PIE face image data set (Sim et al., 2003); c) the MNIST handwritten digit data set (Lecun et al., 1998). ... The Prostate Cancer data set (Petricoin et al., 2002) is obtained by protein mass spectrometry. ... The PIE face image data set used in this experiment1 (Cai et al., 2007) contains 11554 gray face images of 68 people... The MNIST data set contains gray images of scanned handwritten digits, including 60, 000 for training and 10, 000 for testing. ... The Colon Cancer data set (Alon et al., 1999) contains gene expression information... The Lung Cancer data set (Bhattacharjee et al., 2001) contains gene expression information... COIL-100 image data set (Nene et al., 1996; Cai et al., 2011). |
| Dataset Splits | Yes | The MNIST data set contains gray images of scanned handwritten digits, including 60, 000 for training and 10, 000 for testing. ... In each trial, we randomly select 5000 images for each digit from the training set (and in total we have 50000 images) and get a data matrix X R784 50000. Then in each trial, we randomly select an image from the testing set as the response y R784. ... The SVHN data set contains color images of street view house numbers, including 73257 images for training and 26032 for testing. ... In each trial, we first randomly select an image as the response y R3072, and then use the remaining ones to form the data matrix X R3072 99288. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used for running its experiments, such as CPU or GPU models. |
| Software Dependencies | Yes | An efficient MATLAB implementation of the EDPP screening rules combined with the solvers from SLEP package (Liu et al., 2009) for both Lasso and group Lasso is available at http://dpc-screening.github.io/. |
| Experiment Setup | Yes | For each data set, we run the solver with or without the screening rules to solve the Lasso problem along a sequence of 100 parameter values equally spaced on the λ/λmax scale from 0.05 to 1.0. ... We simulate data from the true model y = Xβ + σϵ, ϵ N(0, 1). ... Throughout this section, σ is set to be 0.1. To construct β , we randomly select p components which are populated from a uniform [ 1, 1] distribution, and set the remaining ones as 0. ... we run the solver with or without screening rules to solve the Lasso problems along a sequence of 100 parameter values equally spaced on the λ/λmax scale from 0.05 to 1.0. We then run 100 trials and report the average performance. |