Concave Penalized Estimation of Sparse Gaussian Bayesian Networks
Authors: Bryon Aragam, Qing Zhou
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our use of concave regularization, as opposed to the more popular ℓ0 (e.g. BIC) penalty, is new. Moreover, we provide theoretical guarantees which generalize existing asymptotic results when the underlying distribution is Gaussian. ... Finally, as a matter of independent interest, we provide a comprehensive comparison of our approach to several standard structure learning methods using open-source packages developed for the R language. Based on these experiments, we show that our algorithm obtains higher sensitivity with comparable false discovery rates for high-dimensional data and scales efficiently as the number of nodes increases. |
| Researcher Affiliation | Academia | Bryon Aragam EMAIL Qing Zhou EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90024, USA |
| Pseudocode | Yes | Algorithm 1 CCDr Algorithm Input: Initial estimates (Φ0, R0); penalty parameters (λ, γ); error tolerance ε > 0; maximum number of iterations M. 1. Cycle through ρj for j = 1, . . . , p, minimizing Q2 with respect to ρj at each step. |
| Open Source Code | No | An R package which implements the proposed algorithm along with some of these improvements is currently under development. |
| Open Datasets | Yes | Our first experiment uses network structures from the Bayesian Network Repository, a standardized collection of networks which is commonly used as a benchmark for structure learning methods, as well as a simulated scale-free network. ... We analyzed the well-known flow cytometry data set, generated by Sachs et al. (2005) |
| Dataset Splits | Yes | For tests involving different choices of the sample size, the same DAG was used for each choice of n to generate data sets of different sizes. ... Finally, we split each data set in half in order to obtain a testing data set on which to compute the log-likelihood of the estimated models. |
| Hardware Specification | Yes | All of the tests were performed on a late 2009 Apple i Mac with a 2.66GHz Intel Core i5 processor and 4GB of RAM, running Mac OS X 10.7.5. |
| Software Dependencies | Yes | All of the algorithms were implemented in the R language for statistical computing (R Core Team, 2014). For the PC and GES algorithms, we used the pcalg package (version 2.0-3, Kalisch et al., 2012), and for the MMHC and HC algorithms we used the bnlearn package (version 3.6, Scutari, 2010). |
| Experiment Setup | Yes | For CCDr, we used a linear sequence of 20 values, starting from λmax = n1/2. For both PC and MMHC, we used α ∈ {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05}. ... we chose γ = 2 ... ε = 10-4, M = p1/2 * 10, and α = 3. |