Learning Discrete Bayesian Networks from Continuous Data

Authors: Yi-Chun Chen, Tim A. Wheeler, Mykel J. Kochenderfer

JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical demonstrations show that the proposed method is superior to the established minimum description length algorithm. In addition, this paper shows how to incorporate existing methods into the structure learning process to discretize all continuous variables and simultaneously learn Bayesian network structures. (...) This section describes experiments conducted to evaluate the Bayesian discretization method. All experiments were run on datasets from the publicly available University of California, Irvine machine learning repository (Lichman, 2013).
Researcher Affiliation Academia Yi-Chun Chen EMAIL Institute of Computational and Mathematical Engineering, Stanford University, Stanford, CA 94035 USA Tim A. Wheeler EMAIL Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94035 USA Mykel J. Kochenderfer EMAIL Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94035 USA
Pseudocode Yes Algorithm 1 Discretization of one continuous variable in a Bayesian network Algorithm 2 Discretization of multiple continuous variables Algorithm 3 Learning a discrete-valued Bayesian network
Open Source Code Yes All software is publicly available at github.com/sisl/Learn Discrete Bayes Nets.jl.
Open Datasets Yes All experiments were run on datasets from the publicly available University of California, Irvine machine learning repository (Lichman, 2013). (...) The Auto MPG dataset contains variables related to the fuel consumption of automobiles in urban driving. (...) The Wine dataset contains variables related to the chemical analysis of wines from three different Italian cultivars. (...) The Housing dataset contains variables related to the values of houses in Boston suburbs. (...) The Iris dataset contains variables related to the morphologic variation of three Iris flower species.
Dataset Splits Yes The mean cross-validated log-likelihood is the mean log-likelihood on the witheld dataset among cross-validation folds, and acts as an estimate of generalization error. Ten folds were used in each experiment.
Hardware Specification No Finding optimal discretization policies according to Equation 10 via dynamic programming requires computation times on the order of days on a personal laptop on datasets with more than 100 variables.
Software Dependencies No The paper mentions third-party software packages like Netica (Norsys, 2009), SMILearn (Druzdzel, 1999), and bnlearn (Scutari, 2010) as examples, but does not specify any software dependencies with version numbers for the authors' own implementation or experimental setup. The statement "All software is publicly available at github.com/sisl/Learn Discrete Bayes Nets.jl." indicates their code is available, but does not list dependencies with versions.
Experiment Setup Yes All experiments used a uniform Dirichlet prior of αijk = 1 for all i, j, and k. (...) This result was obtained by running Algorithm 3 fifty times using the Bayesian method and choosing the structure with the highest K2 score (Equation 12). (...) The Bayesian network in Figure 20 was obtained by running Algorithm 3 fifty times on the Auto MPG dataset with a maximum of two parents per variable.