Learning Local Dependence In Ordered Data

Authors: Guo Yu, Jacob Bien

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show our method performing favorably compared to existing methods. We apply our method to genomic data to flexibly model linkage disequilibrium. Our method is also applied to improve the performance of discriminant analysis in sound recording classification. Keywords: Local dependence, Gaussian graphical models, precision matrices, Cholesky factor, hierarchical group lasso
Researcher Affiliation Academia Guo Yu EMAIL Department of Statistical Science Cornell University, 1173 Comstock Hall Ithaca, NY 14853, USA Jacob Bien EMAIL Department of Biological Statistics and Computational Biology and Department of Statistical Science Cornell University, 1178 Comstock Hall Ithaca, NY 14853, USA
Pseudocode Yes Algorithm 1 ADMM algorithm to solve (8) ... Algorithm 2 Algorithm for solving (10) for unweighted estimator ... Algorithm 3 BCD on the dual problem (28)
Open Source Code Yes The R package varband provides C++ implementations of Algorithms 1 and 2. ... An R (R Core Team, 2016) package, named varband, is available on CRAN, implementing our estimator.
Open Datasets Yes We study Hap Map phase 3 data from the International Hap Map project (Consortium et al., 2010). ... a classification problem described in Hastie et al. (2009).
Dataset Splits Yes To gauge the performance of our estimator on modeling LD, we randomly split the 167 samples into training and testing sets of sizes 84 and 83, respectively. ... we randomly split the data into two parts, with 10% of the data assigned to the training set and the remaining 90% of the data assigned to the test set. On the training set, we use 5-fold cross-validation to select the tuning parameter minimizing misclassification error on the validation data.
Hardware Specification No The paper does not contain specific hardware details like CPU/GPU models, memory amounts, or cloud instance types used for the experiments. It focuses on the algorithm, its theoretical properties, and empirical performance without detailing the execution environment hardware.
Software Dependencies No An R (R Core Team, 2016) package, named varband, is available on CRAN, implementing our estimator. The estimation is very fast with core functions coded in C++, allowing us to solve large-scale problems in substantially less time than is possible with the R-based implementation of the nested lasso. This mentions "R" and "C++" as implementation languages and cites an R Core Team publication from 2016, but it does not provide specific version numbers for these, the `varband` package, or any other ancillary software libraries or compilers.
Experiment Setup Yes The tuning parameter λ ≥ 0 in (5) measures the amount of regularization and determines the sparsity level of the estimator. We use 100 tuning parameter values for each estimator and repeat the simulation 10 times. ... Tuning parameters are chosen using the one-standard-error rule (see, e.g., Hastie et al., 2009). ... On the training set, we use 5-fold cross-validation to select the tuning parameter minimizing misclassification error on the validation data.