reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

pomegranate: Fast and Flexible Probabilistic Modeling in Python

Authors: Jacob Schreiber

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate, we generate a data set of 100k samples in 10 dimensions from 2 overlapping Gaussian ellipses with means of 0 and 1 respectively and standard deviations of 2. It took pomegranate 0.04s to learn a Gaussian naive Bayes model with 10 iterations of EM, 0.2s to learn a multivariate Gaussian Bayes classiﬁer with a full covariance matrix with 10 iterations of EM, whereas the scikit-learn label propagation model with a RBF kernel did not converge after 220s and 1000 iterations, and took 2s with a knn kernel with 7 neighbors. Both pomegranate models achieved validation accuracies over 0.75, whereas the scikit-learn models did no better than chance.
Researcher Affiliation	Academia	Jacob Schreiber EMAIL Paul G. Allen School of Computer Science University of Washington Zheng-Chu Seattle, WA 98195-4322, USA
Pseudocode	No	The paper describes methods and functionality but does not include any explicitly labeled pseudocode or algorithm blocks. Procedural descriptions are given in regular paragraph text.
Open Source Code	Yes	The code is available at https://github.com/jmschrei/pomegranate
Open Datasets	No	To demonstrate, we generate a data set of 100k samples in 10 dimensions from 2 overlapping Gaussian ellipses with means of 0 and 1 respectively and standard deviations of 2. It took pomegranate 0.04s to learn a Gaussian naive Bayes model with 10 iterations of EM, 0.2s to learn a multivariate Gaussian Bayes classiﬁer with a full covariance matrix with 10 iterations of EM, whereas the scikit-learn label propagation model with a RBF kernel did not converge after 220s and 1000 iterations, and took 2s with a knn kernel with 7 neighbors. Both pomegranate models achieved validation accuracies over 0.75, whereas the scikit-learn models did no better than chance.
Dataset Splits	No	The paper mentions generating synthetic datasets for demonstration and comparison, and refers to 'validation accuracies' for some experiments, implying the use of validation sets. However, it does not specify exact split percentages, sample counts for splits, or any detailed methodology for partitioning the data. For example, it does not state how the 100k samples were split for the semi-supervised learning experiment.
Hardware Specification	Yes	All comparisons were run on a computational server with 24 Intel Xeon CPU E5-2650 cores with a clock speed of 2.2 GHz, a Tesla K40c GPU, and 256 GB of RAM running Cent OS 6.9.
Software Dependencies	Yes	The software used was pomegranate v0.8.1 and scikit-learn v0.19.0.
Experiment Setup	No	The paper mentions some specific parameters for certain experiments, such as '10 iterations of EM' for learning a Gaussian naive Bayes model and a multivariate Gaussian Bayes classifier, or 'ﬁve iterations of Baum-Welch training' for HMMs. It also refers to 'a knn kernel with 7 neighbors' for a scikit-learn comparison. However, it lacks a comprehensive description of hyperparameters, optimizers, learning rates, or other system-level training settings that would constitute a detailed experimental setup for all experiments described.