reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Averaged Collapsed Variational Bayes Inference

Authors: Katsuhiko Ishiguro, Issei Sato, Naonori Ueda

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, ACVB inferences are comparable to or better than those of existing inference methods and deterministic, fast, and provide easier convergence detection. These features are especially convenient for practitioners who want precise Bayesian inference with assured convergence. Keywords: nonparametric Bayes, collapsed variational Bayes inference, averaged CVB
Researcher Affiliation	Collaboration	Katsuhiko Ishiguro EMAIL NTT Communication Science Laboratories NTT Corporation Kyoto 619-0237, Japan Issei Sato EMAIL Graduate School of Frontier Sciences The University of Tokyo Tokyo 113-0033, Japan Naonori Ueda EMAIL NTT Communication Science Laboratories NTT Corporation Kyoto 619-0237, Japan
Pseudocode	No	The paper describes the procedures and algorithms using mathematical formulations and descriptive text, for example in sections 4.1 "Procedure of ACVB" and Appendix A "CVB inference algorithm for multi-domain IRM", but does not contain a distinct, structured pseudocode block or algorithm box.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions related code in the context of other works but not for its own contribution: "We refrain from presenting the CPU time evolutions of our naive implementations of the collapsed Gibbs samplers, since there is a number of very efficient sampling methods for LDA (Li et al., 2014)."
Open Datasets	Yes	For the LDA experiments we employed two popular real-world datasets and converted them to the Bo W format. The ﬁrst dataset is the 20 news group corpus (Asuncion et al., 2009; Sato and Nakagawa, 2012), including randomly chosen D = 10, 000 documents with a vocabulary size V = 13, 178. The second dataset is the Enron email corpus (Mc Callum et al., 2005) including randomly chosen D = 10, 000 documents with a vocabulary size of V = 15, 258. Stop words were eliminated. ... The second real-world relational dataset is the Lastfm dataset1, which contains several records for the Last.fm music service, including lists of most listened-to musicians, tag assignments for artists, and friend relations among users. We employed the friend relations among N = N1 = N2 = 1892 users (Lastfm User XUser). ... 1. Provided by Het Rec2011. http://ir.ii.uam.es/hetrec2011/
Dataset Splits	Yes	Given an observation dataset, we excluded roughly 10% of the observations from the inference as held-out test data. After the inference was ﬁnished, we computed the perplexity or the marginal log likelihoods of the test data. The test data were randomly sampled for each run.
Hardware Specification	No	The paper mentions CPU time for performance comparison but does not specify any particular CPU model, GPU, or other hardware used for running the experiments. For example, it states: "ACVB0 typically converges quickly in terms of CPU time".
Software Dependencies	No	The paper does not explicitly mention any specific software libraries, frameworks, or tools with their version numbers that are needed to replicate the experiment.
Experiment Setup	Yes	Initialization and hyperparameter choices are important for a fair comparison of inference methods. We employ hyperparameter updates for all solutions: ﬁxed point iterations for VB, CVB, CVB0, ACVB, and ACVB0 and hyper-prior sampling for Gibbs. For LDA, we ﬁxed the initial hyperparameter values based on knowledge from the existing (many) LDA works, especially relying on the result of (Asuncion et al., 2009). For IRM, we tested several initial hyperparameter values and report the results computed using the best hyperparameter setting. All of the hidden variables were initialized in a completely random manner with the uniform distribution to assign soft values of p(zi = k). In the case of Gibbs, we performed hard assignments of zi = k to the most weighted cluster. ... For the LDA experiments, we set the number of topics as K {50, 100, 200}. For the IRM experiments, all of the inferences except Gibbs require a number of truncated clusters a priori. To assess the eﬀect of the truncation level, our experiments examined K1 = K2 = K {20, 40, 60}. ... For the reference Gibbs sampler on LDA, we iterated the sampling procedure 1, 000 times and discarded the ﬁrst 100 iterations as the burn-in period. For the reference Gibbs sampler on IRM, we iterated the sampling procedure 3, 000 times and discarded the ﬁrst 1, 500 iterations as the burn-in period.