reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Concave Penalized Estimation of Sparse Gaussian Bayesian Networks

Authors: Bryon Aragam, Qing Zhou

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our use of concave regularization, as opposed to the more popular ℓ0 (e.g. BIC) penalty, is new. Moreover, we provide theoretical guarantees which generalize existing asymptotic results when the underlying distribution is Gaussian. ... Finally, as a matter of independent interest, we provide a comprehensive comparison of our approach to several standard structure learning methods using open-source packages developed for the R language. Based on these experiments, we show that our algorithm obtains higher sensitivity with comparable false discovery rates for high-dimensional data and scales eﬃciently as the number of nodes increases.
Researcher Affiliation	Academia	Bryon Aragam EMAIL Qing Zhou EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90024, USA
Pseudocode	Yes	Algorithm 1 CCDr Algorithm Input: Initial estimates (Φ0, R0); penalty parameters (λ, γ); error tolerance ε > 0; maximum number of iterations M. 1. Cycle through ρj for j = 1, . . . , p, minimizing Q2 with respect to ρj at each step.
Open Source Code	No	An R package which implements the proposed algorithm along with some of these improvements is currently under development.
Open Datasets	Yes	Our ﬁrst experiment uses network structures from the Bayesian Network Repository, a standardized collection of networks which is commonly used as a benchmark for structure learning methods, as well as a simulated scale-free network. ... We analyzed the well-known ﬂow cytometry data set, generated by Sachs et al. (2005)
Dataset Splits	Yes	For tests involving diﬀerent choices of the sample size, the same DAG was used for each choice of n to generate data sets of diﬀerent sizes. ... Finally, we split each data set in half in order to obtain a testing data set on which to compute the log-likelihood of the estimated models.
Hardware Specification	Yes	All of the tests were performed on a late 2009 Apple i Mac with a 2.66GHz Intel Core i5 processor and 4GB of RAM, running Mac OS X 10.7.5.
Software Dependencies	Yes	All of the algorithms were implemented in the R language for statistical computing (R Core Team, 2014). For the PC and GES algorithms, we used the pcalg package (version 2.0-3, Kalisch et al., 2012), and for the MMHC and HC algorithms we used the bnlearn package (version 3.6, Scutari, 2010).
Experiment Setup	Yes	For CCDr, we used a linear sequence of 20 values, starting from λmax = n1/2. For both PC and MMHC, we used α ∈ {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05}. ... we chose γ = 2 ... ε = 10-4, M = p1/2 * 10, and α = 3.