reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Local Dependence In Ordered Data

Authors: Guo Yu, Jacob Bien

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show our method performing favorably compared to existing methods. We apply our method to genomic data to ﬂexibly model linkage disequilibrium. Our method is also applied to improve the performance of discriminant analysis in sound recording classiﬁcation. Keywords: Local dependence, Gaussian graphical models, precision matrices, Cholesky factor, hierarchical group lasso
Researcher Affiliation	Academia	Guo Yu EMAIL Department of Statistical Science Cornell University, 1173 Comstock Hall Ithaca, NY 14853, USA Jacob Bien EMAIL Department of Biological Statistics and Computational Biology and Department of Statistical Science Cornell University, 1178 Comstock Hall Ithaca, NY 14853, USA
Pseudocode	Yes	Algorithm 1 ADMM algorithm to solve (8) ... Algorithm 2 Algorithm for solving (10) for unweighted estimator ... Algorithm 3 BCD on the dual problem (28)
Open Source Code	Yes	The R package varband provides C++ implementations of Algorithms 1 and 2. ... An R (R Core Team, 2016) package, named varband, is available on CRAN, implementing our estimator.
Open Datasets	Yes	We study Hap Map phase 3 data from the International Hap Map project (Consortium et al., 2010). ... a classiﬁcation problem described in Hastie et al. (2009).
Dataset Splits	Yes	To gauge the performance of our estimator on modeling LD, we randomly split the 167 samples into training and testing sets of sizes 84 and 83, respectively. ... we randomly split the data into two parts, with 10% of the data assigned to the training set and the remaining 90% of the data assigned to the test set. On the training set, we use 5-fold cross-validation to select the tuning parameter minimizing misclassiﬁcation error on the validation data.
Hardware Specification	No	The paper does not contain specific hardware details like CPU/GPU models, memory amounts, or cloud instance types used for the experiments. It focuses on the algorithm, its theoretical properties, and empirical performance without detailing the execution environment hardware.
Software Dependencies	No	An R (R Core Team, 2016) package, named varband, is available on CRAN, implementing our estimator. The estimation is very fast with core functions coded in C++, allowing us to solve large-scale problems in substantially less time than is possible with the R-based implementation of the nested lasso. This mentions "R" and "C++" as implementation languages and cites an R Core Team publication from 2016, but it does not provide specific version numbers for these, the `varband` package, or any other ancillary software libraries or compilers.
Experiment Setup	Yes	The tuning parameter λ ≥ 0 in (5) measures the amount of regularization and determines the sparsity level of the estimator. We use 100 tuning parameter values for each estimator and repeat the simulation 10 times. ... Tuning parameters are chosen using the one-standard-error rule (see, e.g., Hastie et al., 2009). ... On the training set, we use 5-fold cross-validation to select the tuning parameter minimizing misclassiﬁcation error on the validation data.