Bayesian Entropy Estimation for Countable Discrete Distributions

Authors: Evan Archer, Il Memming Park, Jonathan W. Pillow

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.
Researcher Affiliation Academia Evan Archer EMAIL Center for Perceptual Systems The University of Texas at Austin, Austin, TX 78712, USA Max Planck Institute for Biological Cybernetics Spemannstrasse 41 72076 T ubingen, Germany Il Memming Park EMAIL Center for Perceptual Systems The University of Texas at Austin, Austin, TX 78712, USA Jonathan W. Pillow EMAIL Department of Psychology, Section of Neurobiology, Division of Statistics and Scientific Computation, and Center for Perceptual Systems The University of Texas at Austin, Austin, TX 78712, USA
Pseudocode No The paper contains detailed mathematical derivations and descriptions of methods like stick-breaking, but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes A MATLAB implementation of the PYM estimator is available at https://github.com/pillowlab/PYMentropy.
Open Datasets Yes Figure 2: Empirical cumulative distribution functions of words in natural language (left) and neural spike patterns (right). ... (left) Frequency of N = 217826 words in the novel Moby Dick by Herman Melville. ... (right) Frequencies among N = 1.2 106 neural spike words from 27 simultaneously-recorded retinal ganglion cells... (Pillow et al., 2005). We tokenized the novel into individual words using the Python library NLTK. ... We thank E. J. Chichilnisky, A. M. Litke, A. Sher and J. Shlens for retinal data...
Dataset Splits No In each simulation, we draw 10 sample distributions π. From each π we draw a data set of N iid samples. ... For Moby Dick, PYM slightly overestimates, while DPM slightly underestimates... The neural data were preprocessed to be a binarized response... The paper focuses on applying estimators to samples rather than defining train/test splits for model training.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments or simulations.
Software Dependencies No We tokenized the novel into individual words using the Python library NLTK. ... A MATLAB implementation of the PYM estimator is available... The paper mentions software tools (NLTK, MATLAB) but does not provide specific version numbers for them.
Experiment Setup No The paper describes the theoretical framework of the PYM estimator and its application to data, including how samples are drawn for simulations. However, it does not provide specific hyperparameter values, training configurations, or system-level settings for model training as might be found in a typical experimental setup.