reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

abess: A Fast Best-Subset Selection Library in Python and R

Authors: Jin Zhu, Xueqin Wang, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare abess with popular variable selection libraries in Python and R through regression, classiﬁcation, and PCA. All computations are conducted on a Ubuntu platform with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz and 48 RAM. Table 2 displays the regression and classiﬁcation analysis results, suggesting abess derives parsimonious models that achieve competitive performance in few minutes. Particularly, for the cancer data set, it is more than 20x faster than scikit-learn (ℓ1). The results of the sparse PCA (SPCA) are demonstrated in Table 3.
Researcher Affiliation	Academia	Jin Zhu1 EMAIL Xueqin Wang2 EMAIL Liyuan Hu1 EMAIL Junhao Huang1 EMAIL Kangkang Jiang1 EMAIL Yanhang Zhang3 EMAIL Shiyun Lin4 EMAIL Junxian Zhu5 EMAIL 1 Department of Statistical Science, Sun Yat-Sen University, Guangzhou, GD, China 2 Department of Statistics and Finance/International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China 3 School of Statistics, Renmin University of China, Beijing, China 4 Center for Statistical Science, Peking University, Beijing, China 5 Saw Swee Hock School of Public Health, National University of Singapore, Singapore
Pseudocode	No	The paper includes code snippets in Figure 2 and Figure 3, but these are complete executable code examples (R and Python) rather than structured pseudocode or algorithm blocks describing a method in a generic, language-agnostic way.
Open Source Code	Yes	The core of the library is programmed in C++. For ease of use, a Python library is designed for convenient integration with scikit-learn, and it can be installed from the Python Package Index (Py PI). In addition, a user-friendly R library is available at the Comprehensive R Archive Network (CRAN). The source code is available at: https://github.com/abess-team/abess.
Open Datasets	Yes	We are grateful to UCI Machine Learning Repository for sharing the superconductivity and musk data sets. Table 2: Average performance on the superconductivity data set (for regression), the cancer and the musk data sets (for classiﬁcation) (Chin et al., 2006; Dua and Graﬀ, 2017; Hamidieh, 2018) based on 20 randomly drawn test sets. The data set has 217 observations, each of which has 1,413 genetic factors (Christensen et al., 2009).
Dataset Splits	Yes	Table 2: Average performance on the superconductivity data set (for regression), the cancer and the musk data sets (for classiﬁcation) (Chin et al., 2006; Dua and Graﬀ, 2017; Hamidieh, 2018) based on 20 randomly drawn test sets. Figure 3 also shows an example using GridSearchCV with cv=5: `grid_search = GridSearchCV(pipe, param_grid, scoring=scorer, cv=5)`
Hardware Specification	Yes	All computations are conducted on a Ubuntu platform with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz and 48 RAM.
Software Dependencies	Yes	abess can run on most Linux distributions, Windows 32 or 64-bit, and mac OS with Python (version 3.6) or R (version 3.1.0), and can be easily installed from Py PI1 and CRAN2. Python version is 3.9.1 and R version is 3.6.3. Library Version scikit-learn (ℓ1) 1.0.0, celer 0.6.1, elasticnet 1.3.0.
Experiment Setup	Yes	Figure 3 illustrates the integration of the abess Python interface with scikit-learn s modules to build a non-linear model for diagnosing malignant tumors. The code block shows specific parameters for `PolynomialFeatures` (`include_bias=False`, `degree:[1, 2, 3]`, `interaction_only:[True, False]`) and `GridSearchCV` (`cv=5`).