Poisson Random Fields for Dynamic Feature Models

Authors: Valerio Perrone, Paul A. Jenkins, Dario Spanò, Yee Whye Teh

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015. Section 6 combines the model with a linear-Gaussian likelihood and evaluates it on a range of synthetic data sets. Finally, Section 7 illustrates the application of the WF-IBP to topic modeling and presents results obtained on both synthetic data and on the real-world data set consisting of the full text of papers from the NIPS conferences between the years 1987 and 2015.
Researcher Affiliation Academia Valerio Perrone* EMAIL Paul A. Jenkins* EMAIL Dario Spano* EMAIL *Department of Statistics Department of Computer Science University of Warwick Coventry, CV4 7AL, UK Yee Whye Teh EMAIL Department of Statistics University of Oxford Oxford, OX1 3LB, UK
Pseudocode Yes Algorithm 1: Particle Gibbs Algorithm 2: PG: features seen for the first time at time t1. Algorithm 3: Thinning: unseen features alive at time t0 Algorithm 4: Thinning: unseen features born between time t0 and t1
Open Source Code No The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described.
Open Datasets Yes We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015. The data set is available at https://archive.ics.uci.edu/ml/datasets/NIPS+Conference+Papers+ 1987-2015.
Dataset Splits Yes The results in Figure 15 were obtained by holding out different percentages of words (50%, 60%, 70% and 80%) from all the papers published in 1999 and by training the model over the papers published in the time range 1987-1999. The held-out words were then used to compute the testset perplexity after 5 repeated runs with random initializations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or versions (e.g., library names with version numbers) used for the experiments.
Experiment Setup Yes 1000 iterations of the overall algorithm were performed, choosing a burn-in period of 100 iterations and setting the time-units and drift parameters of the W-F diffusion equal to the true ones in the PG update. We set the hyperparameters α and β equal to 1 and the time step to 0.12 diffusion time-units per year so as to reflect realistic evolutions of topic popularity. The Markov chain was run for 2000 iterations with a burn-in period of 200 iterations, setting η = 0.001 and placing a Gamma(5,1) hyper-prior on γ.