Scalable Interpretable Multi-Response Regression via SEED

Authors: Zemin Zheng, M. Taha Bahadori, Yan Liu, Jinchi Lv

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments on three data sets, including two simulation data sets (one for a medium-scale experiment and one for a large-scale experiment) and one application data set in social media analysis, to examine the empirical performance of SEED.
Researcher Affiliation Academia Zemin Zheng EMAIL School of Management and School of Data Science International Institute of Finance University of Science and Technololgy of China Hefei, Anhui 230026, China M. Taha Bahadori EMAIL School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Yan Liu EMAIL Computer Science Department Viterbi School of Engineering University of Southern California Los Angeles, CA 90089, USA Jinchi Lv EMAIL Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, CA 90089, USA
Pseudocode Yes Algorithm 1: SEED Algorithm 2: Iterative thresholding
Open Source Code No The paper includes a license for the publication itself (CC-BY 4.0) but does not provide any explicit statement about making the source code for the described methodology available, nor does it include a link to a code repository.
Open Datasets No We generate a medium-scale synthetic data set as follows... (Section 5.1.1) In this experiment, we gather a Twitter data set with tweets on the Haiti earthquake... (Section 5.2) The paper uses self-generated synthetic data and a privately gathered Twitter dataset. It does not provide concrete access information (links, DOIs, formal citations) for these datasets to be publicly available.
Dataset Splits Yes For a fair comparison, all model parameters are set based on a separate validation set with size nvalid = 500. (Section 5.1.1) For every value of the rank parameter, we tune the sparsity by 5-fold cross-validation. (Section 5.2)
Hardware Specification Yes First, we run our experiments on an off-the-shelf PC with Intel i7 at 3.4GHz and 8GB of memory. (Section 5.1.2) Next, in order to test scalability of SEED in extremely large data sets, we use a machine that is equipped with a Tesla K40 GPU which has 2880 processing cores at 745MHz and 12GB of memory. (Section 5.1.2)
Software Dependencies Yes The system runs MATLAB R2013b on the Windows operating system. (Section 5.1.2) We perform our experiments with MATLAB R2013b on a Debian Linux operating system. (Section 5.1.2)
Experiment Setup Yes To tune the parameters in SEED, we created a grid of sparsity thresholds θ and for each value of θ, the validation errors were recorded while increasing the rank of the solution matrices. The robustness of sparsity threshold θ and termination parameter µ will also be analyzed. (Section 5.1.1) The range of the parameters are generated as follows: µ = logspace( 5, 1, 5) and θ = logspace( 1, log10(20), 10), where logspace(a, b, n) indicates the minimum value 10a, maximum value 10b, and total number n. (Figure 1 caption)