reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lasso Screening Rules via Dual Polytope Projection

Authors: Jie Wang, Peter Wonka, Jieping Ye

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have evaluated our screening rule using synthetic and real data sets. Results show that our rule is more eﬀective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. ... We evaluate our screening rules on synthetic and real data sets from many diﬀerent applications. In Section 4, the experimental results demonstrate that our rules are more eﬀective in discarding inactive features than existing state-of-the-art screening rules.
Researcher Affiliation	Academia	Jie Wang EMAIL Department of Computational Medicine and Bioinformatics University of Michigan Ann Arbor, MI 48109-2218, USA Peter Wonka EMAIL Department of Computer Science and Engineering Arizona State University Tempe, AZ 85287-8809, USA Jieping Ye EMAIL Department of Computational Medicine and Bioinformatics Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2218, USA
Pseudocode	No	The paper describes algorithms and methods but does not present them in explicit pseudocode blocks or figures labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	An eﬃcient MATLAB implementation of the EDPP screening rules combined with the solvers from SLEP package (Liu et al., 2009) for both Lasso and group Lasso is available at http://dpc-screening.github.io/.
Open Datasets	Yes	We evaluate our screening rules on synthetic and real data sets from many diﬀerent applications. ... a) the Prostate Cancer (Petricoin et al., 2002); b) the PIE face image data set (Sim et al., 2003); c) the MNIST handwritten digit data set (Lecun et al., 1998). ... The Prostate Cancer data set (Petricoin et al., 2002) is obtained by protein mass spectrometry. ... The PIE face image data set used in this experiment1 (Cai et al., 2007) contains 11554 gray face images of 68 people... The MNIST data set contains gray images of scanned handwritten digits, including 60, 000 for training and 10, 000 for testing. ... The Colon Cancer data set (Alon et al., 1999) contains gene expression information... The Lung Cancer data set (Bhattacharjee et al., 2001) contains gene expression information... COIL-100 image data set (Nene et al., 1996; Cai et al., 2011).
Dataset Splits	Yes	The MNIST data set contains gray images of scanned handwritten digits, including 60, 000 for training and 10, 000 for testing. ... In each trial, we randomly select 5000 images for each digit from the training set (and in total we have 50000 images) and get a data matrix X R784 50000. Then in each trial, we randomly select an image from the testing set as the response y R784. ... The SVHN data set contains color images of street view house numbers, including 73257 images for training and 26032 for testing. ... In each trial, we ﬁrst randomly select an image as the response y R3072, and then use the remaining ones to form the data matrix X R3072 99288.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used for running its experiments, such as CPU or GPU models.
Software Dependencies	Yes	An eﬃcient MATLAB implementation of the EDPP screening rules combined with the solvers from SLEP package (Liu et al., 2009) for both Lasso and group Lasso is available at http://dpc-screening.github.io/.
Experiment Setup	Yes	For each data set, we run the solver with or without the screening rules to solve the Lasso problem along a sequence of 100 parameter values equally spaced on the λ/λmax scale from 0.05 to 1.0. ... We simulate data from the true model y = Xβ + σϵ, ϵ N(0, 1). ... Throughout this section, σ is set to be 0.1. To construct β , we randomly select p components which are populated from a uniform [ 1, 1] distribution, and set the remaining ones as 0. ... we run the solver with or without screening rules to solve the Lasso problems along a sequence of 100 parameter values equally spaced on the λ/λmax scale from 0.05 to 1.0. We then run 100 trials and report the average performance.