Dual Extrapolation for Sparse GLMs

Authors: Mathurin Massias, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we numerically illustrate the benefits of dual extrapolation on various data sets. Implementation is done in Python, Cython (Behnel et al., 2011) and numba (Lam et al., 2015) for the low-level critical parts. The solvers exactly follow the scikit-learn API (Pedregosa et al., 2011; Buitinck et al., 2013), so that Celer can be used as a drop-in replacement in existing code. The package is available under BSD3 license at https://github. com/mathurinm/celer, with documentation and examples at https://mathurinm.github. io/celer. In all this section, the estimator-specific λmax refers to the smallest value giving a null solution (for instance λmax = X y in the Lasso case, λmax = X y /2 for sparse logistic regression, and λmax = X Y 2, for the Multitask Lasso). Figures 4 to 6, one can see that using acceleration systematically improves the performance of Gap Safe rules, up to a factor 3.
Researcher Affiliation Academia Mathurin Massias EMAIL Universit e Paris-Saclay, Inria, CEA, Palaiseau, France Samuel Vaiter EMAIL CNRS & Institut de Math ematiques de Bourgogne, 21078, Dijon, France Alexandre Gramfort EMAIL Universit e Paris-Saclay, Inria, CEA, Palaiseau, France Joseph Salmon EMAIL IMAG, Univ Montpellier, CNRS, 34095, Montpellier, France
Pseudocode Yes Algorithm 1 PG/cyclic CD for Problem (1) with dual extrapolation Algorithm 2 Celer for Problem (1) Algorithm 3 create WS Algorithm 4 Prox-Newton subproblem solver (illustrated on logistic regression) Algorithm 5 newton direction (illustrated on logistic regression) Algorithm 6 backtracking (illustrated on logistic regression)
Open Source Code Yes The package is available under BSD3 license at https://github. com/mathurinm/celer, with documentation and examples at https://mathurinm.github. io/celer.
Open Datasets Yes Table 1: Characteristics of datasets used name n p q density leukemia 72 7,129 1 news20 19,996 632,983 6.1 10 4 rcv1 train 20,242 19,960 3.7 10 3 finance (log1p) 16,087 1,668,738 3.4 10 3 Magnetoencephalography (MEG) 305 7,498 49 1 Path computation For a fine (resp. coarse) grid of 100 (resp. 10) values of λ geometrically distributed between λmax and λmax/100, the competing algorithms solve the Lasso on various real world datasets from LIBSVM5 (Fan et al., 2008). The data for this experiment uses magnetoencephalography (MEG) recordings which are collected for neuroscience studies. Here we use data from the sample dataset of the MNE software (Gramfort et al., 2014).
Dataset Splits No The paper does not explicitly provide details about training/test/validation dataset splits, specific split percentages, or sample counts. It mentions using 'warm start' and 'path computation' for different lambda values, but this describes the model training strategy rather than data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No Implementation is done in Python, Cython (Behnel et al., 2011) and numba (Lam et al., 2015) for the low-level critical parts. The solvers exactly follow the scikit-learn API (Pedregosa et al., 2011; Buitinck et al., 2013), so that Celer can be used as a drop-in replacement in existing code. The package is available under BSD3 license at https://github. com/mathurinm/celer, with documentation and examples at https://mathurinm.github. io/celer. Problem (73) can be solved straightforwardly with solvers such as CVXPY (Diamond and Boyd, 2016)
Experiment Setup Yes For a fine (resp. coarse) grid of 100 (resp. 10) values of λ geometrically distributed between λmax and λmax/100, the competing algorithms solve the Lasso... The duality gap stopping criterion ϵ varies between 10^-2 and 10^-6. Warm start is used for all algorithms... The stopping criterion for the inner solver on W(t) is to reach a gap lower than a fraction ρ = 0.3 of the duality gap for the whole problem... Algorithm 1: param: T, K = 5, fdual = 10. Algorithm 2: param: K = 5, p(1) = 100, ϵ, MAX WS. Algorithm 3: param: p(1) = 100, ρ = 0.3. Algorithm 4: param: MAX CD = 20, MAX BACKTRACK = 10, K = 5. MAX CD is set to 1 for the computation of the first Prox-Newton direction.