skglm: Improving scikit-learn for Regularized Generalized Linear Models

Authors: Badr Moufad, Pierre-Antoine Bannier, Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 2 showcases the speed of skglm on three benchmarks. For transparent and reproducible benchmarks, we used benchopt (Moreau et al., 2022). Primarily interested in sparse penalties, we focus on the case where the number of features is much greater than the number of samples; in the reversed configuration the results may vary.
Researcher Affiliation Academia Badr Moufad1 EMAIL Pierre-Antoine Bannier EMAIL Quentin Bertrand3 EMAIL Quentin Klopfenstein2 EMAIL Mathurin Massias1 EMAIL 1 Univ Lyon, Inria, CNRS, ENS de Lyon, UCB Lyon 1, LIP UMR 5668, F-69342, Lyon, France 2 University of Luxembourg, LCSB, Esch-sur-Alzette, Luxembourg 3 Mila & Ude M, Canada
Pseudocode No The paper includes 'Code snippets for solving MCP regression' in Figure 1, which are actual code examples, not pseudocode or a clearly labeled algorithm block. The methodology is described in prose rather than structured pseudocode.
Open Source Code Yes We introduce skglm, an open-source Python package for regularized Generalized Linear Models. ... skglm is an open-source package licensed under BSD 3-Clause and hosted on Git Hub4. 4. The repository of skglm https://github.com/scikit-learn-contrib/skglm.
Open Datasets Yes Figure 2: Timing comparison on three problems: Lasso, Sparse Cox, and Group Lasso; on the datasets: MEG, Breast-Cancer, and Drug Potency. 3. Reproduce and extend the benchmarks here https://github.com/benchopt/benchmark_lasso for Lasso, https://github.com/benchopt/benchmark_cox for sparse Cox, and https://github.com/benchopt/benchmark_group_lasso for Group Lasso.
Dataset Splits No The paper mentions using datasets like MEG, Breast-Cancer, and Drug Potency for benchmarking but does not provide specific details on how these datasets were split into training, validation, or test sets, nor does it refer to any standard or pre-defined splits.
Hardware Specification Yes Figure 2: The benchmark was performed using a laptop with specifications: CPU 12th Gen Intel R Core TM i7-12700H @ 2.7GHz, 20 cores, 32GB of RAM.
Software Dependencies No skglm is entirely written in Python. ... skglm relies on Numpy (Harris et al., 2020) and Scipy (Virtanen et al., 2020) for dense and sparse arrays operations. ... JIT-compiled by Numba (Lam et al., 2015). ... skglm estimators are fully-compliant with scikit-learn: they inherit from scikit-learn s base classes and pass the test function sklearn.utils.estimators checks.check estimator. Although various software components are mentioned (Python, Numpy, Scipy, Numba, scikit-learn), specific version numbers for these dependencies are not provided in the text.
Experiment Setup No The paper describes the `skglm` package, its components (datafits, penalties, solvers), and benchmarks its speed. However, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or other training configurations for the benchmarks presented.