Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Authors: Yuqi Gu, Elena E. Erosheva, Gongjun Xu, David B. Dunson

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through applications to a functional disability survey dataset and a personality test dataset.
Researcher Affiliation Academia Yuqi Gu EMAIL Department of Statistics Columbia University New York, NY 10027, USA; Elena A. Erosheva EMAIL Department of Statistics, School of Social Work, and the Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195, USA; Gongjun Xu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA; David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708, USA
Pseudocode Yes We propose a Metropolis-Hastings-within-Gibbs sampler and also a Gibbs sampler for posterior inference of L, Λ, and α based on the data Y. Metropolis-Hastings-within-Gibbs Sampler. This sampler cycles through the following steps. Step 1 3. Sample each column of the conditional probability tables Λj s, the individual mixed-membership proportions πi s, and the individual latent assignments zi,g s from their full conditional posterior distributions. [...] Gibbs Sampler. We also develop a fully Gibbs sampling algorithm for our Gro-M3, leveraging the auxiliary variable method in Zhou (2018) to sample the Dirichlet parameters α. [...] Step 5 Sample the auxiliary variables qi, tik and the Dirichlet parameters αk from the following full conditional posteriors:
Open Source Code No The paper does not provide explicit statements about releasing source code for the methodology described, nor does it include direct links to code repositories.
Open Datasets Yes The NLTCS dataset was downloaded from at http://lib.stat.cmu.edu/datasets/. It is an extract containing responses from n = 21574 community-dwelling elderly Americans aged 65 and above, pooled over 1982, 1984, 1989, and 1994 survey waves. [...] The International Personality Item Pool (IPIP) personality test data. This dataset is publicly available from the Open-Source Psychometrics Project website https://openpsychometrics.org/_rawdata/.
Dataset Splits No For the simulation study, the paper states: 'In each scenario, 50 independent datasets are generated and fitted with the proposed MCMC algorithm'. For real data, it mentions 'After dropping those subjects who have missing entries in their responses, there are n = 901 complete response vectors left.' However, it does not specify explicit train/test/validation splits for model evaluation or reproduction of results in an experimental setting.
Hardware Specification No The paper describes the MCMC algorithm and simulation studies, but does not specify any particular hardware used for computations (e.g., CPU, GPU models, or cloud resources).
Software Dependencies No The paper describes the Bayesian inference procedure and MCMC algorithms but does not mention specific software libraries, frameworks, or their version numbers that were used for implementation.
Experiment Setup Yes In our MCMC algorithm under all simulation settings, we take hyperparameters to be (aα, bα) = (2, 1) and σα = 0.02. The MCMC sampler is run for 15000 iterations, with the first 10000 iterations as burn-in and every fifth sample is collected after burn-in to thin the chain.