Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
The Population Posterior and Bayesian Modeling on Streams
Authors: James McInerney, Rajesh Ranganath, David Blei
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study our method with several large-scale data sets. With held-out likelihood as a measure of model fitness, we found our method to give better models of the data than approaches based on full Bayesian inference [14] or Bayesian updating [8]. |
| Researcher Affiliation | Academia | James Mc Inerney Columbia University EMAIL Rajesh Ranganath Princeton University EMAIL David Blei Columbia University EMAIL |
| Pseudocode | Yes | Algorithm 1 Population Variational Bayes |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | With LDA we analyze three large-scale streamed corpora: 1.7M articles from the New York Times spanning 10 years, 130K Science articles written over 100 years, and 7.4M tweets collected from Twitter on Feb 2nd, 2014. The Ivory Coast location data contains 18M discrete cell tower locations for 500K users recorded over 6 months [6]. The Microsoft Geolife dataset contains 35K latitude-longitude GPS locations for 182 users over 5 years. |
| Dataset Splits | Yes | We measure model fitness by evaluating the average predictive log likelihood on held-out data. This involves splitting held-out observations (that were not involved in the posterior approximation of β) into two equal halves, inferring the local component distribution based on the first half, and testing with the second half [14, 26]. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, memory details, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | The other hyperparameters to our experiments are reported in Appendix C. |