reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization error bounds for multiclass sparse linear classifiers

Authors: Tomer Levy, Felix Abramovich

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate the performance of the derived sparse multinomial logistic regression classiﬁers we applied them to the data set Cancer sites considered in Vincent and Hansen (2014). It consists of bead-based expression data for n = 162 micro RNAs with d = 372 features from L = 18 classes of normal and cancer issue samples. The number of samples in each class ranges from 5 to 26. Vincent and Hansen (2014) used sparse group Lasso classiﬁer for this. We compared the performance of sparse group Slope with λj s and κℓ s of the form given in (14), sparse group Lasso (replicating Vincent and Hansen, 2014), random forest and the well-known gradient boosting trees XGBoost classiﬁers on the above data set, where we developed the proximal gradient algorithm for solving sparse group Slope in (12) see Appendix D. To remove various technical variations, following Vincent and Hansen (2014), the data was ﬁrst normalized by centering and scaling the rows of the design matrix, and then standardized by centering and scaling the columns. We split the data into training (75%) and test (25%) sets. The tuning parameters of all classiﬁcation procedures were chosen by 10-fold cross-validation on the training set, and the misclassiﬁcation errors of the resulting classiﬁers were measured on the test set. We repeated the process 10 times, randomly partitioning the data into train and test sets. Table 1 presents the average (over 10 random splits) misclassiﬁcation errors for the test sets, the numbers of selected features (non-zero rows of the regression coeﬃcients matrix B) and the overall numbers of non-zero coeﬃcients in B. It shows that both sparse multinomial logistic regression classiﬁers outperform their nonparametric counterparts for this data. Sparse group Slope yielded smaller misclassiﬁcation errors than sparse group Lasso and, in addition, resulted in much sparser models.
Researcher Affiliation	Academia	Tomer Levy EMAIL Department of Statistics and Operation Research Tel Aviv University Felix Abramovich EMAIL Department of Statistics and Operation Research Tel Aviv University
Pseudocode	Yes	Appendix D. Sparse group Slope algorithm The penalized MLE minimization problem in (12) involves a sum of a convex smooth log-likelihood and a convex but non-smooth penalty consisting of two terms. A common approach to solve such optimization problems is by the proximal gradient method (e.g., Beck, 2017). A general proximal operator of a given convex function f is deﬁned as proxf(a) = arg min b 1 2 \|\|a − b\|\|2 + f(b) . For the setup at hand consider the proximal operator proκ,λ(A) = arg min B 1 2 \|\|A − B\|\|2 F + \|\|B\|\|κ,λ where recall that \|\|B\|\|κ,λ = Pd j=1 λj\|\|B\|\|(j) + Pd j=1 PL l=1 κl\|B\|j(l) = \|\|B\|\|λ + Pd j=1 \|\|Bj \|\|κ. There exist the eﬃcient proximal gradient descent algorithms for computing proximal operators prox κ and prox λ for κ and λ separately (see respectively Bogdan et al., 2015; Brzyski et al., 2019). We now show that applying prox κ and prox λ consecutively results in prox κ,λ as depicted by Algorithm 1: Algorithm 1: prox κ,λ(A) for j ← 1 . . . d do Uj = prox κ(Aj ) B ← prox λ(U)
Open Source Code	No	The paper does not contain any explicit statements about code release or links to a code repository.
Open Datasets	Yes	To illustrate the performance of the derived sparse multinomial logistic regression classiﬁers we applied them to the data set Cancer sites considered in Vincent and Hansen (2014).
Dataset Splits	Yes	We split the data into training (75%) and test (25%) sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions using 'random forest' and 'XGBoost classiﬁers', and solving with a 'proximal gradient algorithm', but does not specify any version numbers for these software components or libraries.
Experiment Setup	Yes	The tuning parameters of all classiﬁcation procedures were chosen by 10-fold cross-validation on the training set, and the misclassiﬁcation errors of the resulting classiﬁers were measured on the test set. We repeated the process 10 times, randomly partitioning the data into train and test sets.