Averaged Collapsed Variational Bayes Inference

Authors: Katsuhiko Ishiguro, Issei Sato, Naonori Ueda

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, ACVB inferences are comparable to or better than those of existing inference methods and deterministic, fast, and provide easier convergence detection. These features are especially convenient for practitioners who want precise Bayesian inference with assured convergence. Keywords: nonparametric Bayes, collapsed variational Bayes inference, averaged CVB
Researcher Affiliation Collaboration Katsuhiko Ishiguro EMAIL NTT Communication Science Laboratories NTT Corporation Kyoto 619-0237, Japan Issei Sato EMAIL Graduate School of Frontier Sciences The University of Tokyo Tokyo 113-0033, Japan Naonori Ueda EMAIL NTT Communication Science Laboratories NTT Corporation Kyoto 619-0237, Japan
Pseudocode No The paper describes the procedures and algorithms using mathematical formulations and descriptive text, for example in sections 4.1 "Procedure of ACVB" and Appendix A "CVB inference algorithm for multi-domain IRM", but does not contain a distinct, structured pseudocode block or algorithm box.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions related code in the context of other works but not for its own contribution: "We refrain from presenting the CPU time evolutions of our naive implementations of the collapsed Gibbs samplers, since there is a number of very efficient sampling methods for LDA (Li et al., 2014)."
Open Datasets Yes For the LDA experiments we employed two popular real-world datasets and converted them to the Bo W format. The first dataset is the 20 news group corpus (Asuncion et al., 2009; Sato and Nakagawa, 2012), including randomly chosen D = 10, 000 documents with a vocabulary size V = 13, 178. The second dataset is the Enron email corpus (Mc Callum et al., 2005) including randomly chosen D = 10, 000 documents with a vocabulary size of V = 15, 258. Stop words were eliminated. ... The second real-world relational dataset is the Lastfm dataset1, which contains several records for the Last.fm music service, including lists of most listened-to musicians, tag assignments for artists, and friend relations among users. We employed the friend relations among N = N1 = N2 = 1892 users (Lastfm User XUser). ... 1. Provided by Het Rec2011. http://ir.ii.uam.es/hetrec2011/
Dataset Splits Yes Given an observation dataset, we excluded roughly 10% of the observations from the inference as held-out test data. After the inference was finished, we computed the perplexity or the marginal log likelihoods of the test data. The test data were randomly sampled for each run.
Hardware Specification No The paper mentions CPU time for performance comparison but does not specify any particular CPU model, GPU, or other hardware used for running the experiments. For example, it states: "ACVB0 typically converges quickly in terms of CPU time".
Software Dependencies No The paper does not explicitly mention any specific software libraries, frameworks, or tools with their version numbers that are needed to replicate the experiment.
Experiment Setup Yes Initialization and hyperparameter choices are important for a fair comparison of inference methods. We employ hyperparameter updates for all solutions: fixed point iterations for VB, CVB, CVB0, ACVB, and ACVB0 and hyper-prior sampling for Gibbs. For LDA, we fixed the initial hyperparameter values based on knowledge from the existing (many) LDA works, especially relying on the result of (Asuncion et al., 2009). For IRM, we tested several initial hyperparameter values and report the results computed using the best hyperparameter setting. All of the hidden variables were initialized in a completely random manner with the uniform distribution to assign soft values of p(zi = k). In the case of Gibbs, we performed hard assignments of zi = k to the most weighted cluster. ... For the LDA experiments, we set the number of topics as K {50, 100, 200}. For the IRM experiments, all of the inferences except Gibbs require a number of truncated clusters a priori. To assess the effect of the truncation level, our experiments examined K1 = K2 = K {20, 40, 60}. ... For the reference Gibbs sampler on LDA, we iterated the sampling procedure 1, 000 times and discarded the first 100 iterations as the burn-in period. For the reference Gibbs sampler on IRM, we iterated the sampling procedure 3, 000 times and discarded the first 1, 500 iterations as the burn-in period.