pomegranate: Fast and Flexible Probabilistic Modeling in Python
Authors: Jacob Schreiber
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate, we generate a data set of 100k samples in 10 dimensions from 2 overlapping Gaussian ellipses with means of 0 and 1 respectively and standard deviations of 2. It took pomegranate 0.04s to learn a Gaussian naive Bayes model with 10 iterations of EM, 0.2s to learn a multivariate Gaussian Bayes classifier with a full covariance matrix with 10 iterations of EM, whereas the scikit-learn label propagation model with a RBF kernel did not converge after 220s and 1000 iterations, and took 2s with a knn kernel with 7 neighbors. Both pomegranate models achieved validation accuracies over 0.75, whereas the scikit-learn models did no better than chance. |
| Researcher Affiliation | Academia | Jacob Schreiber EMAIL Paul G. Allen School of Computer Science University of Washington Zheng-Chu Seattle, WA 98195-4322, USA |
| Pseudocode | No | The paper describes methods and functionality but does not include any explicitly labeled pseudocode or algorithm blocks. Procedural descriptions are given in regular paragraph text. |
| Open Source Code | Yes | The code is available at https://github.com/jmschrei/pomegranate |
| Open Datasets | No | To demonstrate, we generate a data set of 100k samples in 10 dimensions from 2 overlapping Gaussian ellipses with means of 0 and 1 respectively and standard deviations of 2. It took pomegranate 0.04s to learn a Gaussian naive Bayes model with 10 iterations of EM, 0.2s to learn a multivariate Gaussian Bayes classifier with a full covariance matrix with 10 iterations of EM, whereas the scikit-learn label propagation model with a RBF kernel did not converge after 220s and 1000 iterations, and took 2s with a knn kernel with 7 neighbors. Both pomegranate models achieved validation accuracies over 0.75, whereas the scikit-learn models did no better than chance. |
| Dataset Splits | No | The paper mentions generating synthetic datasets for demonstration and comparison, and refers to 'validation accuracies' for some experiments, implying the use of validation sets. However, it does not specify exact split percentages, sample counts for splits, or any detailed methodology for partitioning the data. For example, it does not state how the 100k samples were split for the semi-supervised learning experiment. |
| Hardware Specification | Yes | All comparisons were run on a computational server with 24 Intel Xeon CPU E5-2650 cores with a clock speed of 2.2 GHz, a Tesla K40c GPU, and 256 GB of RAM running Cent OS 6.9. |
| Software Dependencies | Yes | The software used was pomegranate v0.8.1 and scikit-learn v0.19.0. |
| Experiment Setup | No | The paper mentions some specific parameters for certain experiments, such as '10 iterations of EM' for learning a Gaussian naive Bayes model and a multivariate Gaussian Bayes classifier, or 'five iterations of Baum-Welch training' for HMMs. It also refers to 'a knn kernel with 7 neighbors' for a scikit-learn comparison. However, it lacks a comprehensive description of hyperparameters, optimizers, learning rates, or other system-level training settings that would constitute a detailed experimental setup for all experiments described. |