reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Interpretable Multi-Response Regression via SEED

Authors: Zemin Zheng, M. Taha Bahadori, Yan Liu, Jinchi Lv

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments on three data sets, including two simulation data sets (one for a medium-scale experiment and one for a large-scale experiment) and one application data set in social media analysis, to examine the empirical performance of SEED.
Researcher Affiliation	Academia	Zemin Zheng EMAIL School of Management and School of Data Science International Institute of Finance University of Science and Technololgy of China Hefei, Anhui 230026, China M. Taha Bahadori EMAIL School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Yan Liu EMAIL Computer Science Department Viterbi School of Engineering University of Southern California Los Angeles, CA 90089, USA Jinchi Lv EMAIL Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, CA 90089, USA
Pseudocode	Yes	Algorithm 1: SEED Algorithm 2: Iterative thresholding
Open Source Code	No	The paper includes a license for the publication itself (CC-BY 4.0) but does not provide any explicit statement about making the source code for the described methodology available, nor does it include a link to a code repository.
Open Datasets	No	We generate a medium-scale synthetic data set as follows... (Section 5.1.1) In this experiment, we gather a Twitter data set with tweets on the Haiti earthquake... (Section 5.2) The paper uses self-generated synthetic data and a privately gathered Twitter dataset. It does not provide concrete access information (links, DOIs, formal citations) for these datasets to be publicly available.
Dataset Splits	Yes	For a fair comparison, all model parameters are set based on a separate validation set with size nvalid = 500. (Section 5.1.1) For every value of the rank parameter, we tune the sparsity by 5-fold cross-validation. (Section 5.2)
Hardware Specification	Yes	First, we run our experiments on an oﬀ-the-shelf PC with Intel i7 at 3.4GHz and 8GB of memory. (Section 5.1.2) Next, in order to test scalability of SEED in extremely large data sets, we use a machine that is equipped with a Tesla K40 GPU which has 2880 processing cores at 745MHz and 12GB of memory. (Section 5.1.2)
Software Dependencies	Yes	The system runs MATLAB R2013b on the Windows operating system. (Section 5.1.2) We perform our experiments with MATLAB R2013b on a Debian Linux operating system. (Section 5.1.2)
Experiment Setup	Yes	To tune the parameters in SEED, we created a grid of sparsity thresholds θ and for each value of θ, the validation errors were recorded while increasing the rank of the solution matrices. The robustness of sparsity threshold θ and termination parameter µ will also be analyzed. (Section 5.1.1) The range of the parameters are generated as follows: µ = logspace( 5, 1, 5) and θ = logspace( 1, log10(20), 10), where logspace(a, b, n) indicates the minimum value 10a, maximum value 10b, and total number n. (Figure 1 caption)