Poisson Random Fields for Dynamic Feature Models
Authors: Valerio Perrone, Paul A. Jenkins, Dario Spanò, Yee Whye Teh
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015. Section 6 combines the model with a linear-Gaussian likelihood and evaluates it on a range of synthetic data sets. Finally, Section 7 illustrates the application of the WF-IBP to topic modeling and presents results obtained on both synthetic data and on the real-world data set consisting of the full text of papers from the NIPS conferences between the years 1987 and 2015. |
| Researcher Affiliation | Academia | Valerio Perrone* EMAIL Paul A. Jenkins* EMAIL Dario Spano* EMAIL *Department of Statistics Department of Computer Science University of Warwick Coventry, CV4 7AL, UK Yee Whye Teh EMAIL Department of Statistics University of Oxford Oxford, OX1 3LB, UK |
| Pseudocode | Yes | Algorithm 1: Particle Gibbs Algorithm 2: PG: features seen for the first time at time t1. Algorithm 3: Thinning: unseen features alive at time t0 Algorithm 4: Thinning: unseen features born between time t0 and t1 |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015. The data set is available at https://archive.ics.uci.edu/ml/datasets/NIPS+Conference+Papers+ 1987-2015. |
| Dataset Splits | Yes | The results in Figure 15 were obtained by holding out different percentages of words (50%, 60%, 70% and 80%) from all the papers published in 1999 and by training the model over the papers published in the time range 1987-1999. The held-out words were then used to compute the testset perplexity after 5 repeated runs with random initializations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or versions (e.g., library names with version numbers) used for the experiments. |
| Experiment Setup | Yes | 1000 iterations of the overall algorithm were performed, choosing a burn-in period of 100 iterations and setting the time-units and drift parameters of the W-F diffusion equal to the true ones in the PG update. We set the hyperparameters α and β equal to 1 and the time step to 0.12 diffusion time-units per year so as to reflect realistic evolutions of topic popularity. The Markov chain was run for 2000 iterations with a burn-in period of 200 iterations, setting η = 0.001 and placing a Gamma(5,1) hyper-prior on γ. |