reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BayesPy: Variational Bayesian Inference in Python

Authors: Jaakko Luttinen

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section demonstrates the key steps in using Bayes Py. An artiﬁcial Gaussian mixture dataset is created by drawing 500 samples from two 2-dimensional Gaussian distributions. 200 samples have mean [2, 2] and 300 samples have mean [0, 0]:...The speed of the packages were compared by using two widely used models: a Gaussian mixture model (GMM) and principal component analysis (PCA).3 Both models were run for small and large artiﬁcial datasets....The results are summarized in Table 1.
Researcher Affiliation	Academia	Jaakko Luttinen EMAIL Department of Computer Science Aalto University, Finland
Pseudocode	No	The paper includes Python code snippets demonstrating usage but does not contain structured pseudocode blocks or formal algorithms labeled as such.
Open Source Code	Yes	Bayes Py is an open-source Python software package for performing variational Bayesian inference. It is based on the variational message passing framework and supports conjugate exponential family models....The package is released under the MIT license....The latest development version is available at Git Hub2, which is also the platform used for reporting bugs and making pull requests. 2. https://github.com/bayespy/bayespy
Open Datasets	No	An artiﬁcial Gaussian mixture dataset is created by drawing 500 samples from two 2-dimensional Gaussian distributions. 200 samples have mean [2, 2] and 300 samples have mean [0, 0]...For GMM, the small model used 10 clusters for 200 observations with 2 dimensions, and the large model used 40 clusters for 2000 observations with 10 dimensions. For PCA, the small model used 10-dimensional latent space for 500 observations with 20 dimensions, and the large model used 40-dimensional latent space for 2000 observations with 100 dimensions. The scripts for running the experiments are available as supplementary material. The datasets used are artificial and their generation is either shown in an example or mentioned to be in supplementary material scripts; no pre-existing public dataset with concrete access information is provided.
Dataset Splits	No	An artiﬁcial Gaussian mixture dataset is created by drawing 500 samples from two 2-dimensional Gaussian distributions. 200 samples have mean [2, 2] and 300 samples have mean [0, 0]. The paper describes the composition of an artificial dataset and the characteristics of other artificial datasets but does not provide specific training/testing/validation splits.
Hardware Specification	Yes	The experiments were run on a quad-core (i7-4702MQ) Linux computer.
Software Dependencies	No	It requires Python 3 and a few popular packages: Num Py, Sci Py, matplotlib and h5py. While Python 3 specifies a major version, it lacks specific point release numbers and exact version numbers for the other listed packages.
Experiment Setup	Yes	We construct a mixture model for the data and assume that the parameters, the cluster assignments and the true number of clusters are unknown. The model uses a maximum number of ﬁve clusters but the eﬀective number of clusters will be determined automatically:...The unknown cluster means and precision matrices are given Gaussian and Wishart prior distributions:...The cluster assignments are categorical variables, and the cluster probabilities are given a Dirichlet prior distribution:...Before running the VMP algorithm, the symmetry in the model is broken by a random initialization of the cluster assignments:...The VMP algorithm updates the variables in turns and is run for 200 iterations or until convergence: