Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model
Authors: Clint P. George, Hani Doss
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real data show good performance of our methodology. |
| Researcher Affiliation | Academia | Clint P. George EMAIL Informatics Institute University of Florida Gainesville, FL 32611, USA Hani Doss EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA |
| Pseudocode | Yes | Algorithm 1: Serial tempering. See the discussion in Sec. 2.3 and Appendices A.1 and 4. |
| Open Source Code | Yes | Software for implementation of our algorithms as well as datasets we use are available as an R package at https://github.com/clintpgeorge/ldamcmc |
| Open Datasets | Yes | The 20Newsgroups dataset is commonly used in the machine learning literature for experiments on applications of text classification and clustering algorithms. It contains approximately 20,000 articles that are partitioned relatively evenly across 20 different newsgroups or categories. We created the second set of corpora from web articles downloaded from the English Wikipedia, with the help of the Media Wiki API3. 2. http://qwone.com/~jason/20Newsgroups 3. http://www.mediawiki.org/wiki/API:Query |
| Dataset Splits | No | The paper describes methods for evaluating model fit using held-out documents (leave-one-out cross-validation implicitly), such as "For d = 1, . . . , D, let w( d) denote the corpus consisting of all the documents except for document d. To evaluate a given model (in our case the LDA model indexed by a given h), in essence we see how well the model based on w( d) predicts document d, the held-out document." However, it does not explicitly provide details about training/test/validation dataset splits for general model training beyond this evaluation strategy. |
| Hardware Specification | Yes | Our experiments were conducted through the R programming language, using Rcpp, on a 3.70GHz quad core Intel Xeon Processor E5-1630V3. |
| Software Dependencies | No | The paper mentions "R programming language, using Rcpp" and the "R package mcmcse (Flegal et al., 2016)" but does not provide specific version numbers for R, Rcpp, or the mcmcse package. |
| Experiment Setup | Yes | For each hyperparameter value hj (j = 1, . . . , 220), we took Φj to be the Markov transition function of the Augmented Collapsed Gibbs Sampler alluded to earlier and described in detail in Section 4... We took the sequence h1, . . . , h J to consist of an 11 20 grid of 220 evenly-spaced values over the region (η, α) [.6, 1.1] [.15, 1.1]... We took the Markov transition function K(j, ) on L = {1, . . . , 220} to be the uniform distribution on Nj where Nj is the subset of L consisting of the indices of the hl s that are neighbors of the point hj... Each plot was constructed from a serial tempering chain, using the methodology described in Section 2.3. We took the number of cycles of the Gibbs sampler to be 10,000... For the serial tempering estimate, each run consisted of 2 tuning iterations with a chain length of 51,000 cycles, and a final iteration with a chain length of 101,000 cycles; and in each case, the first 1000 cycles were deleted as burn-in. |