reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Entropy Estimation for Countable Discrete Distributions

Authors: Evan Archer, Il Memming Park, Jonathan W. Pillow

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.
Researcher Affiliation	Academia	Evan Archer EMAIL Center for Perceptual Systems The University of Texas at Austin, Austin, TX 78712, USA Max Planck Institute for Biological Cybernetics Spemannstrasse 41 72076 T ubingen, Germany Il Memming Park EMAIL Center for Perceptual Systems The University of Texas at Austin, Austin, TX 78712, USA Jonathan W. Pillow EMAIL Department of Psychology, Section of Neurobiology, Division of Statistics and Scientific Computation, and Center for Perceptual Systems The University of Texas at Austin, Austin, TX 78712, USA
Pseudocode	No	The paper contains detailed mathematical derivations and descriptions of methods like stick-breaking, but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	A MATLAB implementation of the PYM estimator is available at https://github.com/pillowlab/PYMentropy.
Open Datasets	Yes	Figure 2: Empirical cumulative distribution functions of words in natural language (left) and neural spike patterns (right). ... (left) Frequency of N = 217826 words in the novel Moby Dick by Herman Melville. ... (right) Frequencies among N = 1.2 106 neural spike words from 27 simultaneously-recorded retinal ganglion cells... (Pillow et al., 2005). We tokenized the novel into individual words using the Python library NLTK. ... We thank E. J. Chichilnisky, A. M. Litke, A. Sher and J. Shlens for retinal data...
Dataset Splits	No	In each simulation, we draw 10 sample distributions π. From each π we draw a data set of N iid samples. ... For Moby Dick, PYM slightly overestimates, while DPM slightly underestimates... The neural data were preprocessed to be a binarized response... The paper focuses on applying estimators to samples rather than defining train/test splits for model training.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments or simulations.
Software Dependencies	No	We tokenized the novel into individual words using the Python library NLTK. ... A MATLAB implementation of the PYM estimator is available... The paper mentions software tools (NLTK, MATLAB) but does not provide specific version numbers for them.
Experiment Setup	No	The paper describes the theoretical framework of the PYM estimator and its application to data, including how samples are drawn for simulations. However, it does not provide specific hyperparameter values, training configurations, or system-level settings for model training as might be found in a typical experimental setup.