reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data

Authors: Yuqi Gu, Elena E. Erosheva, Gongjun Xu, David B. Dunson

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results demonstrate good computational performance and empirically conﬁrm the identiﬁability results. We illustrate the new methodology through applications to a functional disability survey dataset and a personality test dataset.
Researcher Affiliation	Academia	Yuqi Gu EMAIL Department of Statistics Columbia University New York, NY 10027, USA; Elena A. Erosheva EMAIL Department of Statistics, School of Social Work, and the Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195, USA; Gongjun Xu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA; David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708, USA
Pseudocode	Yes	We propose a Metropolis-Hastings-within-Gibbs sampler and also a Gibbs sampler for posterior inference of L, Λ, and α based on the data Y. Metropolis-Hastings-within-Gibbs Sampler. This sampler cycles through the following steps. Step 1 3. Sample each column of the conditional probability tables Λj s, the individual mixed-membership proportions πi s, and the individual latent assignments zi,g s from their full conditional posterior distributions. [...] Gibbs Sampler. We also develop a fully Gibbs sampling algorithm for our Gro-M3, leveraging the auxiliary variable method in Zhou (2018) to sample the Dirichlet parameters α. [...] Step 5 Sample the auxiliary variables qi, tik and the Dirichlet parameters αk from the following full conditional posteriors:
Open Source Code	No	The paper does not provide explicit statements about releasing source code for the methodology described, nor does it include direct links to code repositories.
Open Datasets	Yes	The NLTCS dataset was downloaded from at http://lib.stat.cmu.edu/datasets/. It is an extract containing responses from n = 21574 community-dwelling elderly Americans aged 65 and above, pooled over 1982, 1984, 1989, and 1994 survey waves. [...] The International Personality Item Pool (IPIP) personality test data. This dataset is publicly available from the Open-Source Psychometrics Project website https://openpsychometrics.org/_rawdata/.
Dataset Splits	No	For the simulation study, the paper states: 'In each scenario, 50 independent datasets are generated and ﬁtted with the proposed MCMC algorithm'. For real data, it mentions 'After dropping those subjects who have missing entries in their responses, there are n = 901 complete response vectors left.' However, it does not specify explicit train/test/validation splits for model evaluation or reproduction of results in an experimental setting.
Hardware Specification	No	The paper describes the MCMC algorithm and simulation studies, but does not specify any particular hardware used for computations (e.g., CPU, GPU models, or cloud resources).
Software Dependencies	No	The paper describes the Bayesian inference procedure and MCMC algorithms but does not mention specific software libraries, frameworks, or their version numbers that were used for implementation.
Experiment Setup	Yes	In our MCMC algorithm under all simulation settings, we take hyperparameters to be (aα, bα) = (2, 1) and σα = 0.02. The MCMC sampler is run for 15000 iterations, with the ﬁrst 10000 iterations as burn-in and every ﬁfth sample is collected after burn-in to thin the chain.