Scalable Interpretable Multi-Response Regression via SEED
Authors: Zemin Zheng, M. Taha Bahadori, Yan Liu, Jinchi Lv
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments on three data sets, including two simulation data sets (one for a medium-scale experiment and one for a large-scale experiment) and one application data set in social media analysis, to examine the empirical performance of SEED. |
| Researcher Affiliation | Academia | Zemin Zheng EMAIL School of Management and School of Data Science International Institute of Finance University of Science and Technololgy of China Hefei, Anhui 230026, China M. Taha Bahadori EMAIL School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Yan Liu EMAIL Computer Science Department Viterbi School of Engineering University of Southern California Los Angeles, CA 90089, USA Jinchi Lv EMAIL Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, CA 90089, USA |
| Pseudocode | Yes | Algorithm 1: SEED Algorithm 2: Iterative thresholding |
| Open Source Code | No | The paper includes a license for the publication itself (CC-BY 4.0) but does not provide any explicit statement about making the source code for the described methodology available, nor does it include a link to a code repository. |
| Open Datasets | No | We generate a medium-scale synthetic data set as follows... (Section 5.1.1) In this experiment, we gather a Twitter data set with tweets on the Haiti earthquake... (Section 5.2) The paper uses self-generated synthetic data and a privately gathered Twitter dataset. It does not provide concrete access information (links, DOIs, formal citations) for these datasets to be publicly available. |
| Dataset Splits | Yes | For a fair comparison, all model parameters are set based on a separate validation set with size nvalid = 500. (Section 5.1.1) For every value of the rank parameter, we tune the sparsity by 5-fold cross-validation. (Section 5.2) |
| Hardware Specification | Yes | First, we run our experiments on an off-the-shelf PC with Intel i7 at 3.4GHz and 8GB of memory. (Section 5.1.2) Next, in order to test scalability of SEED in extremely large data sets, we use a machine that is equipped with a Tesla K40 GPU which has 2880 processing cores at 745MHz and 12GB of memory. (Section 5.1.2) |
| Software Dependencies | Yes | The system runs MATLAB R2013b on the Windows operating system. (Section 5.1.2) We perform our experiments with MATLAB R2013b on a Debian Linux operating system. (Section 5.1.2) |
| Experiment Setup | Yes | To tune the parameters in SEED, we created a grid of sparsity thresholds θ and for each value of θ, the validation errors were recorded while increasing the rank of the solution matrices. The robustness of sparsity threshold θ and termination parameter µ will also be analyzed. (Section 5.1.1) The range of the parameters are generated as follows: µ = logspace( 5, 1, 5) and θ = logspace( 1, log10(20), 10), where logspace(a, b, n) indicates the minimum value 10a, maximum value 10b, and total number n. (Figure 1 caption) |