Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
On Regularized Square-root Regression Problems: Distributionally Robust Interpretation and Fast Computations
Authors: Hong T.M. Chu, Kim-Chuan Toh, Yangjing Zhang
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our algorithm is highly efficient for solving the square-root sparse group Lasso problems and the square-root fused Lasso problems. We conduct numerical experiments on synthetic and real data sets in Section 4, and give the conclusion in Section 5. |
| Researcher Affiliation | Academia | Hong T.M. Chu EMAIL Department of Mathematics National University of Singapore Singapore 119076 Kim-Chuan Toh EMAIL Department of Mathematics, and Institute of Operations Research and Analytics National University of Singapore Singapore 119076 Yangjing Zhang EMAIL Institute of Applied Mathematics, Academy of Mathematics and Systems Science Chinese Academy of Sciences People s Republic of China 100190 |
| Pseudocode | Yes | Algorithm 1: Proximal point algorithm for solving (20) Algorithm 2: Semismooth Newton method for solving (24) |
| Open Source Code | No | The paper uses third-party tools and data sets with provided links/citations but does not explicitly state that the source code for the methodology described in this paper is made publicly available by the authors. For example, it mentions "the R package WGCNA" and cites several data sources, but not the code for their PPDNA algorithm. |
| Open Datasets | Yes | We use the UCI repository (Asuncion and Newman, 2007; Chang and Lin, 2011) in this section. Climate data (Kalnay et al., 1996) The first data we use is the colon cancer data (Alon et al., 1999)1. It is available at http://www.weizmann.ac.il/mcb/Uri Alon/download/downloadable-data. In addition, we use the lung cancer data (Monti et al., 2003)2, which has also been used in (Li et al., 2017). It is available at http://portals.broadinstitute.org/cgi-bin/cancer/publications/view/87. Another data tested is the acute leukemia data (Golub et al., 1999)3. This data includes 72 samples, and each sample Xi includes the expression profiles of 10713 (repeated) genes. It is available at https://github.com/wangyanyanwangyanyan/wangyanyan. |
| Dataset Splits | Yes | In this section, we present the numerical results of the square-root sparse group Lasso model on two real data sets which are equipped with natural group structures. For a given data set (X, Y ) in this section, we randomly split it into the training set (Xtrain, Ytrain) and the test set (Xtest, Ytest) so that the number of observations Ntrain of the training data set is roughly twice larger than the number of observations Ntest of the test set. Based on the training set, we first set (w1, w2) = (0, 1) and conduct 8-fold cross validation (CV) for selecting λ over the set. |
| Hardware Specification | Yes | All the experiments are performed in MATLAB (version 9.7) on a Windows workstation (6-core, Intel Core i7-8750H @ 2.20GHz, 8 Gigabytes of RAM). |
| Software Dependencies | Yes | All the experiments are performed in MATLAB (version 9.7) on a Windows workstation (6-core, Intel Core i7-8750H @ 2.20GHz, 8 Gigabytes of RAM). |
| Experiment Setup | Yes | For all the algorithms, we set tol to be 10 7 and maxtime to be 30 minutes. In addition, we set maxiter for our algorithm to be 102, and for other algorithms to be 106. Algorithm 2: Semismooth Newton method for solving (24) ... η (0, 1), ϱ (0, 1]. ρ (0, 1), µ (0, 0.5). tol > 0. Algorithm 3: p ADMM ... ρ = 1.618, µ > 0. For the sparse group Lasso regularizer, we always choose the weights of groups as ωl = p|Gl|. We choose λ {λBun, λSt G, λBl G}, where the three tuning parameters λBun, λSt G, and λBl G are theoretically optimal values... |