Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially variable selection) compared to competing methods. |
| Researcher Affiliation | Academia | Antoine Dedieu EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA; Hussein Hazimeh EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA; Rahul Mazumder EMAIL Sloan School of Management Operations Research Center MIT Center for Statistics Massachusetts Institute of Technology Cambridge, MA 02142, USA |
| Pseudocode | Yes | Algorithm 1: Cyclic Coordinate Descent (CD); Algorithm 2: CD with Local Combinatorial Search; Algorithm 3: Fast Heuristic for Local Search when m = 1; Algorithm 4: Integrality Generation Algorithm (IGA) |
| Open Source Code | Yes | Our CD and local combinatorial optimization algorithms are publicly available through our fast C++/R toolkit L0Learn: on CRAN at https://cran.r-project.org/package=L0Learn and also at https://github.com/hazimehh/L0Learn. |
| Open Datasets | Yes | We consider the following three binary classification data sets taken from the NIPS 2003 Feature Selection Challenge (Guyon et al., 2005): Arcene: This data set is used to identify cancer vs non-cancerous patterns in massspectrometric data. ... Dorothea: This data set is used to distinguish active chemical compounds in a drug. ... Dexter: The task here is to identify text documents discussing corporate acquisitions. |
| Dataset Splits | Yes | Arcene: ... We used 140 observations for training and 40 observations for testing. Dorothea: ... We used 805 observations for training and 230 observations for testing. Dexter: ... We used 420 observations for training and 120 observations for testing. |
| Hardware Specification | Yes | The experiments were carried out on a machine with a 6-core Intel Core i7-8750H processor and 16GB of RAM, running mac OS 10.15.7, R 4.0.3 (with vec Lib s BLAS implementation), and MATLAB R2020b. ... For this experiment, we use a machine with a 12-core Intel Xeon E5 @ 2.7 GHz and 64GB of RAM, running OSX 10.13.6 and Gurobi v8.1. |
| Software Dependencies | Yes | The experiments were carried out on a machine with a 6-core Intel Core i7-8750H processor and 16GB of RAM, running mac OS 10.15.7, R 4.0.3 (with vec Lib s BLAS implementation), and MATLAB R2020b. ... We use Gurobi s MIP solver for our experiments. ... running OSX 10.13.6 and Gurobi v8.1. |
| Experiment Setup | Yes | We use ℓ0-ℓq as a shorthand to denote the penalty λ0 β 0 + λq β q q for q {1, 2}. For all penalties that involve 2 tuning parameters i.e., ℓ0-ℓq (for q {1, 2}), MCP, Gra SP, and NHTP, we sweep the parameters over a two-dimensional grid. For our penalties, we choose 100 λ0 values as described in Section 2.4. For Gra SP and NHTP, we sweep the number of nonzeros between 1 and 100. For ℓ0-ℓ1, and we choose a sequence of 10 λ1-values in [a, b], where a corresponds to a zero solution and b = 10 4a. Similarly, for ℓ0-ℓ2, Gra SP, and NHTP, we choose 10 λ2 values between 10 4 and 100 for the experiment in Section 5.2; and between 10 8 and 10 4 for that in Section 5.3. For MCP, the sequence of 100 λ values is set to the default values selected by ncvreg, and we vary the second parameter γ over 10 values between 1.5 and 25. For the ℓ1-penalty, the grid of 100 λ values is set to the default sequence chosen by glmnet. ... For all toolkits except NHTP,13 we set the convergence threshold to 10 6 and solve for a regularization path with 100 solutions. For L0Learn, we use a grid of λ0-values varying between λmax 0 (the value which sets all coefficients to zero) and 0.001λmax 0 . For L0Learn and NHTP, we set λ2 = 10 7. |