Learning Discrete Bayesian Networks from Continuous Data
Authors: Yi-Chun Chen, Tim A. Wheeler, Mykel J. Kochenderfer
JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical demonstrations show that the proposed method is superior to the established minimum description length algorithm. In addition, this paper shows how to incorporate existing methods into the structure learning process to discretize all continuous variables and simultaneously learn Bayesian network structures. (...) This section describes experiments conducted to evaluate the Bayesian discretization method. All experiments were run on datasets from the publicly available University of California, Irvine machine learning repository (Lichman, 2013). |
| Researcher Affiliation | Academia | Yi-Chun Chen EMAIL Institute of Computational and Mathematical Engineering, Stanford University, Stanford, CA 94035 USA Tim A. Wheeler EMAIL Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94035 USA Mykel J. Kochenderfer EMAIL Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94035 USA |
| Pseudocode | Yes | Algorithm 1 Discretization of one continuous variable in a Bayesian network Algorithm 2 Discretization of multiple continuous variables Algorithm 3 Learning a discrete-valued Bayesian network |
| Open Source Code | Yes | All software is publicly available at github.com/sisl/Learn Discrete Bayes Nets.jl. |
| Open Datasets | Yes | All experiments were run on datasets from the publicly available University of California, Irvine machine learning repository (Lichman, 2013). (...) The Auto MPG dataset contains variables related to the fuel consumption of automobiles in urban driving. (...) The Wine dataset contains variables related to the chemical analysis of wines from three different Italian cultivars. (...) The Housing dataset contains variables related to the values of houses in Boston suburbs. (...) The Iris dataset contains variables related to the morphologic variation of three Iris flower species. |
| Dataset Splits | Yes | The mean cross-validated log-likelihood is the mean log-likelihood on the witheld dataset among cross-validation folds, and acts as an estimate of generalization error. Ten folds were used in each experiment. |
| Hardware Specification | No | Finding optimal discretization policies according to Equation 10 via dynamic programming requires computation times on the order of days on a personal laptop on datasets with more than 100 variables. |
| Software Dependencies | No | The paper mentions third-party software packages like Netica (Norsys, 2009), SMILearn (Druzdzel, 1999), and bnlearn (Scutari, 2010) as examples, but does not specify any software dependencies with version numbers for the authors' own implementation or experimental setup. The statement "All software is publicly available at github.com/sisl/Learn Discrete Bayes Nets.jl." indicates their code is available, but does not list dependencies with versions. |
| Experiment Setup | Yes | All experiments used a uniform Dirichlet prior of αijk = 1 for all i, j, and k. (...) This result was obtained by running Algorithm 3 fifty times using the Bayesian method and choosing the structure with the highest K2 score (Equation 12). (...) The Bayesian network in Figure 20 was obtained by running Algorithm 3 fifty times on the Auto MPG dataset with a maximum of two parents per variable. |