WONDER: Weighted One-shot Distributed Ridge Regression in High Dimensions
Authors: Edgar Dobriban, Yue Sheng
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy. Keywords: distributed learning, ridge regression, high-dimensional statistics, random matrix theory... We provide numerical simulations throughout the paper, and additional ones in Section 6, along with an example using an empirical data set. |
| Researcher Affiliation | Academia | Edgar Dobriban EMAIL Wharton Statistics Department University of Pennsylvania Philadelphia, PA 19104, USA; Yue Sheng EMAIL Graduate Group in Applied Mathematics and Computational Science University of Pennsylvania Philadelphia, PA 19104, USA |
| Pseudocode | Yes | Algorithm 1: WONDER: Weighted ONe-shot Distribut Ed Ridge regression algorithm, general design... Algorithm 2: WONDER: Weighted ONe-shot Distribut Ed Ridge regression algorithm, isotropic design |
| Open Source Code | Yes | The code for our paper is available at github.com/dobriban/dist_ridge. |
| Open Datasets | Yes | We test WONDER in simulation studies and using the Million Song Dataset as an example.... Figure 10: Million Song Year Prediction Dataset (MSD). Optimal weighted average (WONDER), Naive average, and regression on 1/k fraction of data.... Specifically, we perform the following steps in our data analysis. We download the data set from the UC Irvine Machine Learning Repository. The original data set has N = 515, 345 samples and p = 91 features. |
| Dataset Splits | Yes | The data set has already been divided into a training set and a test set. The training set consists of the first 463, 715 samples and the test set contains the rest. We attempt to predict the release year of a song. Before doing distributed regression, we first center and normalize both the design matrix X and the outcome Y . Now we are ready to do ridge regression under the distributed setting. For each experiment, we randomly choose ntrain = 10, 000 samples from the training set and ntest = 1, 000 samples from the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For each group, we choose the same tuning parameter λi = p/(niα2). For the global regression on the entire data set, we choose the tuning parameter λ = p/(nα2) optimally.... We set all local regularization parameters to equal values, which is reasonable, since the local problems are exchangeable. We also parametrize the regularization parameters as multiples of the optimal parameter for the isotropic case (which equals kγ/α2).... We try different tuning parameters λ around kp/(ntrain ˆα2), and use λ = 3kp/(ntrain ˆα2) as our final parameter. (In practice, one may try a 1-D grid search to find the right scale.) |