reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

skglm: Improving scikit-learn for Regularized Generalized Linear Models

Authors: Badr Moufad, Pierre-Antoine Bannier, Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 2 showcases the speed of skglm on three benchmarks. For transparent and reproducible benchmarks, we used benchopt (Moreau et al., 2022). Primarily interested in sparse penalties, we focus on the case where the number of features is much greater than the number of samples; in the reversed conﬁguration the results may vary.
Researcher Affiliation	Academia	Badr Moufad1 EMAIL Pierre-Antoine Bannier EMAIL Quentin Bertrand3 EMAIL Quentin Klopfenstein2 EMAIL Mathurin Massias1 EMAIL 1 Univ Lyon, Inria, CNRS, ENS de Lyon, UCB Lyon 1, LIP UMR 5668, F-69342, Lyon, France 2 University of Luxembourg, LCSB, Esch-sur-Alzette, Luxembourg 3 Mila & Ude M, Canada
Pseudocode	No	The paper includes 'Code snippets for solving MCP regression' in Figure 1, which are actual code examples, not pseudocode or a clearly labeled algorithm block. The methodology is described in prose rather than structured pseudocode.
Open Source Code	Yes	We introduce skglm, an open-source Python package for regularized Generalized Linear Models. ... skglm is an open-source package licensed under BSD 3-Clause and hosted on Git Hub4. 4. The repository of skglm https://github.com/scikit-learn-contrib/skglm.
Open Datasets	Yes	Figure 2: Timing comparison on three problems: Lasso, Sparse Cox, and Group Lasso; on the datasets: MEG, Breast-Cancer, and Drug Potency. 3. Reproduce and extend the benchmarks here https://github.com/benchopt/benchmark_lasso for Lasso, https://github.com/benchopt/benchmark_cox for sparse Cox, and https://github.com/benchopt/benchmark_group_lasso for Group Lasso.
Dataset Splits	No	The paper mentions using datasets like MEG, Breast-Cancer, and Drug Potency for benchmarking but does not provide specific details on how these datasets were split into training, validation, or test sets, nor does it refer to any standard or pre-defined splits.
Hardware Specification	Yes	Figure 2: The benchmark was performed using a laptop with speciﬁcations: CPU 12th Gen Intel R Core TM i7-12700H @ 2.7GHz, 20 cores, 32GB of RAM.
Software Dependencies	No	skglm is entirely written in Python. ... skglm relies on Numpy (Harris et al., 2020) and Scipy (Virtanen et al., 2020) for dense and sparse arrays operations. ... JIT-compiled by Numba (Lam et al., 2015). ... skglm estimators are fully-compliant with scikit-learn: they inherit from scikit-learn s base classes and pass the test function sklearn.utils.estimators checks.check estimator. Although various software components are mentioned (Python, Numpy, Scipy, Numba, scikit-learn), specific version numbers for these dependencies are not provided in the text.
Experiment Setup	No	The paper describes the `skglm` package, its components (datafits, penalties, solvers), and benchmarks its speed. However, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or other training configurations for the benchmarks presented.