reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Discrete Bayesian Networks from Continuous Data

Authors: Yi-Chun Chen, Tim A. Wheeler, Mykel J. Kochenderfer

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical demonstrations show that the proposed method is superior to the established minimum description length algorithm. In addition, this paper shows how to incorporate existing methods into the structure learning process to discretize all continuous variables and simultaneously learn Bayesian network structures. (...) This section describes experiments conducted to evaluate the Bayesian discretization method. All experiments were run on datasets from the publicly available University of California, Irvine machine learning repository (Lichman, 2013).
Researcher Affiliation	Academia	Yi-Chun Chen EMAIL Institute of Computational and Mathematical Engineering, Stanford University, Stanford, CA 94035 USA Tim A. Wheeler EMAIL Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94035 USA Mykel J. Kochenderfer EMAIL Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94035 USA
Pseudocode	Yes	Algorithm 1 Discretization of one continuous variable in a Bayesian network Algorithm 2 Discretization of multiple continuous variables Algorithm 3 Learning a discrete-valued Bayesian network
Open Source Code	Yes	All software is publicly available at github.com/sisl/Learn Discrete Bayes Nets.jl.
Open Datasets	Yes	All experiments were run on datasets from the publicly available University of California, Irvine machine learning repository (Lichman, 2013). (...) The Auto MPG dataset contains variables related to the fuel consumption of automobiles in urban driving. (...) The Wine dataset contains variables related to the chemical analysis of wines from three diﬀerent Italian cultivars. (...) The Housing dataset contains variables related to the values of houses in Boston suburbs. (...) The Iris dataset contains variables related to the morphologic variation of three Iris ﬂower species.
Dataset Splits	Yes	The mean cross-validated log-likelihood is the mean log-likelihood on the witheld dataset among cross-validation folds, and acts as an estimate of generalization error. Ten folds were used in each experiment.
Hardware Specification	No	Finding optimal discretization policies according to Equation 10 via dynamic programming requires computation times on the order of days on a personal laptop on datasets with more than 100 variables.
Software Dependencies	No	The paper mentions third-party software packages like Netica (Norsys, 2009), SMILearn (Druzdzel, 1999), and bnlearn (Scutari, 2010) as examples, but does not specify any software dependencies with version numbers for the authors' own implementation or experimental setup. The statement "All software is publicly available at github.com/sisl/Learn Discrete Bayes Nets.jl." indicates their code is available, but does not list dependencies with versions.
Experiment Setup	Yes	All experiments used a uniform Dirichlet prior of αijk = 1 for all i, j, and k. (...) This result was obtained by running Algorithm 3 ﬁfty times using the Bayesian method and choosing the structure with the highest K2 score (Equation 12). (...) The Bayesian network in Figure 20 was obtained by running Algorithm 3 ﬁfty times on the Auto MPG dataset with a maximum of two parents per variable.