Generalized Sparse Additive Models
Authors: Asad Haris, Noah Simon, Ali Shojaie
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with empirical studies comparing some existing methods within this framework. Keywords: Generalized Additive Models, Sparsity, Minimax, High-Dimensional, Penalized Regression 5. Simulation Study In this section, to complement our theoretical results, we conduct a simulation study to study the finite sample performance of various GSAMs as a function of n. 6. Data Analysis 6.1 Boston Housing Data |
| Researcher Affiliation | Academia | Asad Haris EMAIL Department of Earth, Ocean and Atmospheric Sciences University of British Columbia 2020 2207 Main Mall Vancouver, BC, Canada V6T 1Z4 Noah Simon EMAIL Ali Shojaie EMAIL Department of Biostatistics University of Washington Seattle, WA 98195-7232, USA |
| Pseudocode | Yes | Algorithm 1 General Proximal Gradient Algorithm for (3) Algorithm A.1 Block Coordinate Descent for Least Squares Loss |
| Open Source Code | Yes | The R package GSAM, available on https://github.com/asadharis/GSAM, implements the methods described in this paper. |
| Open Datasets | Yes | 6.1 Boston Housing Data We use the methods of Section 5 to predict the value of owner-occupied homes in the suburbs of Boston using census data from 1970. ... As done in the data analysis by Ravikumar et al. (2009), we add 10 noise covariates uniformly generated on the unit interval and 10 additional noise covariates obtained by randomly permuting the original covariates. 6.2 Gene Expression Data We used the Curated Microarray Database (Cu Mi Da) (Feltes et al., 2019): a repository of gene-expression data sets curated from the Gene Expression Omnibus (GEO). ... 1. Lung: ... accession number GSE19804. ... 2. Prostate: ... accession number GSE6919 U95B. ... 3. Breast: ... accession number GSE70947. ... 4. Oral cavity: ... accession number GSE42743. |
| Dataset Splits | Yes | Approximately 75% of the observations are used as training set, and the mean square prediction error on the test set is reported. The final model is selected using 5-fold cross validation using the 1 standard error rule . Results are presented for 100 splits of the data into training and test sets. We split the data as follows: 60% as training, 20% as validation and 20% as test data. |
| Hardware Specification | Yes | For 100 replications of the proximal problem on a quadcore Intel Core, i7-10510U CPU @1.80GHz, the median run-time with n = 500 for Pst = Psobolev was 693.20 µs. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | We fit each method over a sequence of 50 λ values on the training set, and select the tuning parameter λ which minimizes the test error ( ytest by 2 n). For the estimated model bfλ , we report the mean square error (MSE; bfλ f0 2 n) as a function of n. All methods were fit for a sequence of λ values, using (λsp, λst) = (λ, λ2) for GSAMs. The λ value with the smallest area under the curve (AUC) for the ROC curve on the validation set was selected, and the corresponding model was used to classify samples in the test set. |